diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-31 14:57:54 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-07-31 14:57:54 -0700 |
| commit | beace86e61e465dba204a268ab3f3377153a4973 (patch) | |
| tree | 24f90cb26bf39eb7724326cdf3e8bffed7c05e50 /Documentation/mm | |
| parent | cbbf0a759ff96c80dfc32192a2cc427b79447f74 (diff) | |
| parent | af915c3c13b64d196d1c305016092f5da20942c4 (diff) | |
Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
Diffstat (limited to 'Documentation/mm')
| -rw-r--r-- | Documentation/mm/arch_pgtable_helpers.rst | 14 | ||||
| -rw-r--r-- | Documentation/mm/damon/design.rst | 4 | ||||
| -rw-r--r-- | Documentation/mm/damon/maintainer-profile.rst | 35 | ||||
| -rw-r--r-- | Documentation/mm/page_migration.rst | 39 | ||||
| -rw-r--r-- | Documentation/mm/physical_memory.rst | 2 | ||||
| -rw-r--r-- | Documentation/mm/process_addrs.rst | 54 |
6 files changed, 101 insertions, 47 deletions
diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index af245161d8e7..ba2f658bc241 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -30,8 +30,6 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_protnone | Tests a PROT_NONE PTE | +---------------------------+--------------------------------------------------+ -| pte_devmap | Tests a ZONE_DEVICE mapped PTE | -+---------------------------+--------------------------------------------------+ | pte_soft_dirty | Tests a soft dirty PTE | +---------------------------+--------------------------------------------------+ | pte_swp_soft_dirty | Tests a soft dirty swapped PTE | @@ -104,8 +102,6 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_protnone | Tests a PROT_NONE PMD | +---------------------------+--------------------------------------------------+ -| pmd_devmap | Tests a ZONE_DEVICE mapped PMD | -+---------------------------+--------------------------------------------------+ | pmd_soft_dirty | Tests a soft dirty PMD | +---------------------------+--------------------------------------------------+ | pmd_swp_soft_dirty | Tests a soft dirty swapped PMD | @@ -177,8 +173,6 @@ PUD Page Table Helpers +---------------------------+--------------------------------------------------+ | pud_write | Tests a writable PUD | +---------------------------+--------------------------------------------------+ -| pud_devmap | Tests a ZONE_DEVICE mapped PUD | -+---------------------------+--------------------------------------------------+ | pud_mkyoung | Creates a young PUD | +---------------------------+--------------------------------------------------+ | pud_mkold | Creates an old PUD | @@ -242,13 +236,13 @@ SWAP Page Table Helpers ======================== +---------------------------+--------------------------------------------------+ -| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | +| __pte_to_swp_entry | Creates a swp_entry_t (arch) from a swap PTE | +---------------------------+--------------------------------------------------+ -| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | +| __swp_entry_to_pte | Creates a swap PTE from a swp_entry_t (arch) | +---------------------------+--------------------------------------------------+ -| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | +| __pmd_to_swp_entry | Creates a swp_entry_t (arch) from a swap PMD | +---------------------------+--------------------------------------------------+ -| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | +| __swp_entry_to_pmd | Creates a swap PMD from a swp_entry_t (arch) | +---------------------------+--------------------------------------------------+ | is_migration_entry | Tests a migration (read or write) swapped entry | +-------------------------------+----------------------------------------------+ diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index ddc50db3afa4..03f8137256f5 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -452,9 +452,9 @@ that supports each action are as below. - ``lru_deprio``: Deprioritize the region on its LRU lists. Supported by ``paddr`` operations set. - ``migrate_hot``: Migrate the regions prioritizing warmer regions. - Supported by ``paddr`` operations set. + Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. - ``migrate_cold``: Migrate the regions prioritizing colder regions. - Supported by ``paddr`` operations set. + Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. - ``stat``: Do nothing but count the statistics. Supported by all operations sets. diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst index ce3e98458339..5cd07905a193 100644 --- a/Documentation/mm/damon/maintainer-profile.rst +++ b/Documentation/mm/damon/maintainer-profile.rst @@ -7,9 +7,9 @@ The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR' section of 'MAINTAINERS' file. The mailing lists for the subsystem are damon@lists.linux.dev and -linux-mm@kvack.org. Patches should be made against the `mm-unstable tree -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ whenever possible and posted -to the mailing lists. +linux-mm@kvack.org. Patches should be made against the `mm-new tree +<https://git.kernel.org/akpm/mm/h/mm-new>`_ whenever possible and posted to the +mailing lists. SCM Trees --------- @@ -17,17 +17,19 @@ SCM Trees There are multiple Linux trees for DAMON development. Patches under development or testing are queued in `damon/next <https://git.kernel.org/sj/h/damon/next>`_ by the DAMON maintainer. -Sufficiently reviewed patches will be queued in `mm-unstable -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ by the memory management -subsystem maintainer. After more sufficient tests, the patches will be queued -in `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_, and finally -pull-requested to the mainline by the memory management subsystem maintainer. - -Note again the patches for `mm-unstable tree -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ are queued by the memory -management subsystem maintainer. If the patches requires some patches in -`damon/next tree <https://git.kernel.org/sj/h/damon/next>`_ which not yet merged -in mm-unstable, please make sure the requirement is clearly specified. +Sufficiently reviewed patches will be queued in `mm-new +<https://git.kernel.org/akpm/mm/h/mm-new>`_ by the memory management subsystem +maintainer. As more sufficient tests are done, the patches will move to +`mm-unstable <https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and then to +`mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_. And finally those +will be pull-requested to the mainline by the memory management subsystem +maintainer. + +Note again the patches for `mm-new tree +<https://git.kernel.org/akpm/mm/h/mm-new>`_ are queued by the memory management +subsystem maintainer. If the patches requires some patches in `damon/next tree +<https://git.kernel.org/sj/h/damon/next>`_ which not yet merged in mm-new, +please make sure the requirement is clearly specified. Submit checklist addendum ------------------------- @@ -53,8 +55,9 @@ Further doing below and putting the results will be helpful. Key cycle dates --------------- -Patches can be sent anytime. Key cycle dates of the `mm-unstable -<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and `mm-stable +Patches can be sent anytime. Key cycle dates of the `mm-new +<https://git.kernel.org/akpm/mm/h/mm-new>`_, `mm-unstable +<https://git.kernel.org/akpm/mm/h/mm-unstable>`_and `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_ trees depend on the memory management subsystem maintainer. diff --git a/Documentation/mm/page_migration.rst b/Documentation/mm/page_migration.rst index 519b35a4caf5..34602b254aa6 100644 --- a/Documentation/mm/page_migration.rst +++ b/Documentation/mm/page_migration.rst @@ -146,18 +146,33 @@ Steps: 18. The new page is moved to the LRU and can be scanned by the swapper, etc. again. -Non-LRU page migration -====================== - -Although migration originally aimed for reducing the latency of memory -accesses for NUMA, compaction also uses migration to create high-order -pages. For compaction purposes, it is also useful to be able to move -non-LRU pages, such as zsmalloc and virtio-balloon pages. - -If a driver wants to make its pages movable, it should define a struct -movable_operations. It then needs to call __SetPageMovable() on each -page that it may be able to move. This uses the ``page->mapping`` field, -so this field is not available for the driver to use for other purposes. +movable_ops page migration +========================== + +Selected typed, non-folio pages (e.g., pages inflated in a memory balloon, +zsmalloc pages) can be migrated using the movable_ops migration framework. + +The "struct movable_operations" provide callbacks specific to a page type +for isolating, migrating and un-isolating (putback) these pages. + +Once a page is indicated as having movable_ops, that condition must not +change until the page was freed back to the buddy. This includes not +changing/clearing the page type and not changing/clearing the +PG_movable_ops page flag. + +Arbitrary drivers cannot currently make use of this framework, as it +requires: + +(a) a page type +(b) indicating them as possibly having movable_ops in page_has_movable_ops() + based on the page type +(c) returning the movable_ops from page_movable_ops() based on the page + type +(d) not reusing the PG_movable_ops and PG_movable_ops_isolated page flags + for other purposes + +For example, balloon drivers can make use of this framework through the +balloon-compaction infrastructure residing in the core kernel. Monitoring Migration ===================== diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst index d3ac106e6b14..9af11b5bd145 100644 --- a/Documentation/mm/physical_memory.rst +++ b/Documentation/mm/physical_memory.rst @@ -584,7 +584,7 @@ Compaction control ``compact_blockskip_flush`` Set to true when compaction migration scanner and free scanner meet, which - means the ``PB_migrate_skip`` bits should be cleared. + means the ``PB_compact_skip`` bits should be cleared. ``contiguous`` Set to true when the zone is contiguous (in other words, no hole). diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst index e6756e78b476..be49e2a269e4 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -303,7 +303,9 @@ There are four key operations typically performed on page tables: 1. **Traversing** page tables - Simply reading page tables in order to traverse them. This only requires that the VMA is kept stable, so a lock which establishes this suffices for traversal (there are also lockless variants - which eliminate even this requirement, such as :c:func:`!gup_fast`). + which eliminate even this requirement, such as :c:func:`!gup_fast`). There is + also a special case of page table traversal for non-VMA regions which we + consider separately below. 2. **Installing** page table mappings - Whether creating a new mapping or modifying an existing one in such a way as to change its identity. This requires that the VMA is kept stable via an mmap or VMA lock (explicitly not @@ -335,15 +337,13 @@ ahead and perform these operations on page tables (though internally, kernel operations that perform writes also acquire internal page table locks to serialise - see the page table implementation detail section for more details). +.. note:: We free empty PTE tables on zap under the RCU lock - this does not + change the aforementioned locking requirements around zapping. + When **installing** page table entries, the mmap or VMA lock must be held to keep the VMA stable. We explore why this is in the page table locking details section below. -.. warning:: Page tables are normally only traversed in regions covered by VMAs. - If you want to traverse page tables in areas that might not be - covered by VMAs, heavier locking is required. - See :c:func:`!walk_page_range_novma` for details. - **Freeing** page tables is an entirely internal memory management operation and has special requirements (see the page freeing section below for more details). @@ -355,6 +355,44 @@ has special requirements (see the page freeing section below for more details). from the reverse mappings, but no other VMAs can be permitted to be accessible and span the specified range. +Traversing non-VMA page tables +------------------------------ + +We've focused above on traversal of page tables belonging to VMAs. It is also +possible to traverse page tables which are not represented by VMAs. + +Kernel page table mappings themselves are generally managed but whatever part of +the kernel established them and the aforementioned locking rules do not apply - +for instance vmalloc has its own set of locks which are utilised for +establishing and tearing down page its page tables. + +However, for convenience we provide the :c:func:`!walk_kernel_page_table_range` +function which is synchronised via the mmap lock on the :c:macro:`!init_mm` +kernel instantiation of the :c:struct:`!struct mm_struct` metadata object. + +If an operation requires exclusive access, a write lock is used, but if not, a +read lock suffices - we assert only that at least a read lock has been acquired. + +Since, aside from vmalloc and memory hot plug, kernel page tables are not torn +down all that often - this usually suffices, however any caller of this +functionality must ensure that any additionally required locks are acquired in +advance. + +We also permit a truly unusual case is the traversal of non-VMA ranges in +**userland** ranges, as provided for by :c:func:`!walk_page_range_debug`. + +This has only one user - the general page table dumping logic (implemented in +:c:macro:`!mm/ptdump.c`) - which seeks to expose all mappings for debug purposes +even if they are highly unusual (possibly architecture-specific) and are not +backed by a VMA. + +We must take great care in this case, as the :c:func:`!munmap` implementation +detaches VMAs under an mmap write lock before tearing down page tables under a +downgraded mmap read lock. + +This means such an operation could race with this, and thus an mmap **write** +lock is required. + Lock ordering ------------- @@ -461,6 +499,10 @@ Locking Implementation Details Page table locking details -------------------------- +.. note:: This section explores page table locking requirements for page tables + encompassed by a VMA. See the above section on non-VMA page table + traversal for details on how we handle that case. + In addition to the locks described in the terminology section above, we have additional locks dedicated to page tables: |