CVE-2024-5206

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.
Configurations

Configuration 1 (hide)

cpe:2.3:a:scikit-learn:scikit-learn:*:*:*:*:*:python:*:*

History

21 Nov 2024, 09:47

Type Values Removed Values Added
References () https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 - Patch () https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 - Patch
References () https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c - Third Party Advisory () https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c - Third Party Advisory

24 Oct 2024, 19:48

Type Values Removed Values Added
CWE CWE-922
First Time Scikit-learn scikit-learn
Scikit-learn
References () https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 - () https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 - Patch
References () https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c - () https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c - Third Party Advisory
CPE cpe:2.3:a:scikit-learn:scikit-learn:*:*:*:*:*:python:*:*

17 Jun 2024, 19:15

Type Values Removed Values Added
CVSS v2 : unknown
v3 : 5.3
v2 : unknown
v3 : 4.7

07 Jun 2024, 14:56

Type Values Removed Values Added
Summary
  • (es) Se identificó una vulnerabilidad de fuga de datos confidenciales en TfidfVectorizer de scikit-learn, específicamente en versiones hasta la 1.4.1.post1 incluida, que se solucionó en la versión 1.5.0. La vulnerabilidad surge del almacenamiento inesperado de todos los tokens presentes en los datos de entrenamiento dentro del atributo `stop_words_`, en lugar de almacenar solo el subconjunto de tokens necesarios para que funcione la técnica TF-IDF. Este comportamiento conduce a una posible fuga de información confidencial, ya que el atributo `stop_words_` podría contener tokens que debían descartarse y no almacenarse, como contraseñas o claves. El impacto de esta vulnerabilidad varía según la naturaleza de los datos que procesa el vectorizador.

06 Jun 2024, 19:16

Type Values Removed Values Added
New CVE

Information

Published : 2024-06-06 19:16

Updated : 2024-11-21 09:47


NVD link : CVE-2024-5206

Mitre link : CVE-2024-5206

CVE.ORG link : CVE-2024-5206


JSON object : View

Products Affected

scikit-learn

  • scikit-learn
CWE
CWE-921

Storage of Sensitive Data in a Mechanism without Access Control

CWE-922

Insecure Storage of Sensitive Information