CVE-2024-45024

In the Linux kernel, the following vulnerability has been resolved: mm/hugetlb: fix hugetlb vs. core-mm PT locking We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD. Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock. However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. This issue can be reproduced [1], for example triggering: [ 3105.936100] ------------[ cut here ]------------ [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Modules linked in: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Call trace: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598 Let's make huge_pte_lockptr() effectively use the same PT locks as any core-mm page table walker would. Add ptep_lockptr() to obtain the PTE page table lock using a pte pointer -- unfortunately we cannot convert pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables we can have with CONFIG_HIGHPTE. Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such that when e.g., CONFIG_PGTABLE_LEVELS==2 with PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document why that works. There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP and consequently doesn't use split PT locks. [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Configurations

Configuration 1 (hide)

OR cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc2:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc3:*:*:*:*:*:*

History

13 Sep 2024, 16:30

Type Values Removed Values Added
CPE cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc1:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc3:*:*:*:*:*:*
cpe:2.3:o:linux:linux_kernel:6.11:rc2:*:*:*:*:*:*
CVSS v2 : unknown
v3 : unknown
v2 : unknown
v3 : 5.5
First Time Linux linux Kernel
Linux
Summary
  • (es) En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: mm/hugetlb: corrección del bloqueo de PT de hugetlb frente a core-mm Recientemente hicimos que el código de recorrido de tabla de páginas común de GUP también recorriera VMA hugetlb sin la mayoría de las mayúsculas y minúsculas especiales de hugetlb, preparándonos para el futuro de tener menos código de recorrido de tabla de páginas específico de hugetlb en la base de código. Resulta que nos perdimos un detalle de bloqueo de tabla de páginas: el bloqueo de tabla de páginas para folios hugetlb que no están mapeados usando un solo PMD/PUD. Supongamos que tenemos un folio hugetlb que abarca múltiples PTE (por ejemplo, folios hugetlb de 64 KiB en arm64 con un tamaño de página base de 4 KiB). GUP, mientras recorre las tablas de páginas, realizará un pte_offset_map_lock() para agarrar el bloqueo de tabla PTE. Sin embargo, hugetlb que modifica simultáneamente estas tablas de páginas en realidad agarraría el mm->page_table_lock: con USE_SPLIT_PTE_PTLOCKS, los bloqueos serían diferentes. Algo similar puede suceder ahora mismo con folios hugetlb que abarcan múltiples PMD cuando USE_SPLIT_PMD_PTLOCKS. Este problema se puede reproducir [1], por ejemplo, activando: [ 3105.936100] ------------[ cortar aquí ]------------ [ 3105.939323] ADVERTENCIA: CPU: 31 PID: 2732 en mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Módulos vinculados en: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer No contaminado 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Nombre del hardware: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 24/05/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 00000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9: ffffb854771b12f0 [ 3106.034324] x8: 000800000000000 x7: ffff7a546c1aa980 x6: 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Rastreo de llamadas: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598 Hagamos que huge_pte_lockptr() use efectivamente los mismos bloqueos PT que cualquier rastreador de tablas de páginas core-mm haría. Agregue ptep_lockptr() para obtener el bloqueo de la tabla de páginas PTE usando un puntero pte - desafortunadamente no podemos convertir pte_lockptr() porque virt_to_page() no funciona con tablas de páginas kmap'ed que podemos tener con CONFIG_HIGHPTE. Maneje CONFIG_PGTABLE_LEVELS correctamente verificando en orden inverso, de modo que cuando, por ejemplo, CONFIG_PGTABLE_LEVELS==2 con PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE funcionará como se espera. Documente por qué funciona eso. Hay un caso desagradable: powerpc 8xx, en el que tenemos un folio hugetlb de 8 MiB que se asigna utilizando dos tablas de páginas PTE. Mientras hugetlb quiere tomar el bloqueo de la tabla PMD, core-mm tomaría el bloqueo de la tabla PTE de una de ambas tablas de páginas PTE. En tales casos extremos, tenemos que asegurarnos de que ambos bloqueos coincidan, lo que (¡afortunadamente!) --- truncado ----
CWE CWE-667
References () https://git.kernel.org/stable/c/5f75cfbd6bb02295ddaed48adf667b6c828ce07b - () https://git.kernel.org/stable/c/5f75cfbd6bb02295ddaed48adf667b6c828ce07b - Patch
References () https://git.kernel.org/stable/c/7300dadba49e531af2d890ae4e34c9b115384a62 - () https://git.kernel.org/stable/c/7300dadba49e531af2d890ae4e34c9b115384a62 - Patch

11 Sep 2024, 16:15

Type Values Removed Values Added
New CVE

Information

Published : 2024-09-11 16:15

Updated : 2024-09-13 16:30


NVD link : CVE-2024-45024

Mitre link : CVE-2024-45024

CVE.ORG link : CVE-2024-45024


JSON object : View

Products Affected

linux

  • linux_kernel
CWE
CWE-667

Improper Locking