summaryrefslogtreecommitdiff
path: root/mm/damon/ops-common.c
AgeCommit message (Collapse)Author
14 daysmm: replace pmd_to_swp_entry() with softleaf_from_pmd()Lorenzo Stoakes
Introduce softleaf_from_pmd() to do the equivalent operation for PMDs that softleaf_from_pte() fulfils, and cascade changes through code base accordingly, introducing helpers as necessary. We are then able to eliminate pmd_to_swp_entry(), is_pmd_migration_entry(), is_pmd_device_private_entry() and is_pmd_non_present_folio_entry(). This further establishes the use of leaf operations throughout the code base and further establishes the foundations for eliminating is_swap_pmd(). No functional change intended. [lorenzo.stoakes@oracle.com: check writable, not readable/writable, per Vlastimil] Link: https://lkml.kernel.org/r/cd97b6ec-00f9-45a4-9ae0-8f009c212a94@lucifer.local Link: https://lkml.kernel.org/r/3fb431699639ded8fdc63d2210aa77a38c8891f1.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: SeongJae Park <sj@kernel.org>\ Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
14 daysmm/rmap: extend rmap and migration support device-private entriesBalbir Singh
Add device-private THP support to reverse mapping infrastructure, enabling proper handling during migration and walk operations. The key changes are: - add_migration_pmd()/remove_migration_pmd(): Handle device-private entries during folio migration and splitting - page_vma_mapped_walk(): Recognize device-private THP entries during VMA traversal operations This change supports folio splitting and migration operations on device-private entries. [balbirs@nvidia.com: fix override of entry in remove_migration_pmd] Link: https://lkml.kernel.org/r/20251114012153.2634497-2-balbirs@nvidia.com [balbirs@nvidia.com: follow pattern used in remove_migration_pte()] Link: https://lkml.kernel.org/r/20251115002835.3515194-1-balbirs@nvidia.com Link: https://lkml.kernel.org/r/20251001065707.920170-5-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16mm: always call rmap_walk() on locked foliosLokesh Gidra
Patch series "Improve UFFDIO_MOVE scalability by removing anon_vma lock", v2. Userfaultfd has a scalability issue in its UFFDIO_MOVE ioctl, which is heavily used in Android as its java garbage collector uses it for concurrent heap compaction. The issue arises because UFFDIO_MOVE updates folio->mapping to an anon_vma with a different root, in order to move the folio from a src VMA to dst VMA. It performs the operation with the folio locked, but this is insufficient, because rmap_walk() can be performed on non-KSM anonymous folios without folio lock. This means that UFFDIO_MOVE has to acquire the anon_vma write lock of the root anon_vma belonging to the folio it wishes to move. This causes scalability bottleneck when multiple threads perform UFFDIO_MOVE simultanously on distinct pages of the same src VMA. In field traces of arm64 android devices, we have observed janky user interactions due to long (sometimes over ~50ms) uninterruptible sleeps on main UI thread caused by anon_vma lock contention in UFFDIO_MOVE. This is particularly severe during the beginning of GC's compaction phase when it is likely to have multiple threads involved. This patch resolves the issue by removing the exception in rmap_walk() for non-KSM anon folios by ensuring that all folios are locked during rmap walk. This is less problematic than it might seem, as the only major caller which utilises this mode is shrink_active_list(), which is covered in detail in the first patch of this series. As a result of changing our approach to locking, we can remove all the code that took steps to acquire an anon_vma write lock instead of a folio lock. This results in a significant simplification and scalability improvement of the code (currently only in UFFDIO_MOVE). Furthermore, as a side-effect, folio_lock_anon_vma_read() gets simpler as we don't need to worry that folio->mapping may have changed under us. This patch (of 2): Guarantee that rmap_walk() is called on locked folios so that threads changing folio->mapping and folio->index for non-KSM anon folios can serialize on fine-grained folio lock rather than anon_vma lock. Other folio types are already always locked before rmap_walk(). With this, we are going from 'not necessarily' locking the non-KSM anon folio to 'definitely' locking it during rmap walks. This patch is in preparation for removing anon_vma write-lock from UFFDIO_MOVE. With this patch, three functions are now expected to be called with a locked folio. To be careful of not missing any case, here is the exhaustive list of all their callers. 1) rmap_walk() is called from: a) folio_referenced() b) damon_folio_mkold() c) damon_folio_young() d) page_idle_clear_pte_refs() e) try_to_unmap() f) try_to_migrate() g) folio_mkclean() h) remove_migration_ptes() In the above list, first 4 are changed in this patch to try-lock non-KSM anon folios, similar to other types of folios. The remaining functions in the list already hold folio lock when calling rmap_walk(). 2) folio_lock_anon_vma_read() is called from following functions: a) collect_procs_anon() b) page_idle_clear_pte_refs() c) damon_folio_mkold() d) damon_folio_young() e) folio_referenced() f) try_to_unmap() g) try_to_migrate() All the functions in above list, except collect_procs_anon(), are covered by the rmap_walk() list above. For collect_procs_anon(), with kill_procs_now() changed to take folio lock in this patch ensures that all callers of folio_lock_anon_vma_read() now hold the lock. 3) folio_get_anon_vma() is called from following functions, all of which already hold the folio lock: a) move_pages_huge_pmd() b) __folio_split() c) move_pages_ptes() d) migrate_folio_unmap() e) unmap_and_move_huge_page() Functionally, this patch doesn't break the logic because rmap walkers generally do some other check to see if what is expected to mapped did happen so it's fine, or otherwise treat things as best-effort. Among the 4 functions changed in this patch, folio_referenced() is the only core-mm function, and is also frequently accessed. To assess the impact of locking non-KSM anon folios in shrink_active_list()->folio_referenced() path, we performed an app cycle test on an arm64 android device. During the whole duration of the test there were over 140k invocations of shrink_active_list(), out of which over 29k had at least one non-KSM anon folio on which folio_referenced() was called. In none of these invocations folio_trylock() failed. Of course, we now take a lock where we wouldn't previously have. In the past it would have had a major impact in causing a CoW write fault to copy a page in do_wp_page(), as commit 09854ba94c6a ("mm: do_wp_page() simplification") caused a failure to obtain folio lock to result in a page copy even if one wasn't necessary. However, since commit 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive"), and the introduction of the folio anon exclusive flag, this issue is significantly mitigated. The only case remaining that we might worry about from this perspective is that of read-only folios immediately after fork where the anon exclusive bit will not have been set yet. We note however in the case of read-only just-forked folios that wp_can_reuse_anon_folio() will notice the raised reference count established by shrink_active_list() via isolate_lru_folios() and refuse to reuse in any case, so this will in fact have no impact - the folio lock is ultimately immaterial here. All-in-all it appears that there is little opportunity for meaningful negative impact from this change. Link: https://lkml.kernel.org/r/20250923071019.775806-1-lokeshgidra@google.com Link: https://lkml.kernel.org/r/20250923071019.775806-2-lokeshgidra@google.com Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Barry Song <baohua@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Nicolas Geoffray <ngeoffray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13mm: remove redundant __GFP_NOWARNQianfeng Rong
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Link: https://lkml.kernel.org/r/20250812135225.274316-1-rongqianfeng@vivo.com Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13mm/damon/paddr: move filters existence check function to ops-commonYueyang Pan
Patch series "mm/damon/vaddr: support stat-purpose DAMOS filters", v4. Extend DAMOS_STAT handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters. Functionality Test ================== I wrote a small test program which allocates 10GB of DRAM, use madvise(MADV_HUGEPAGE) to convert the base pages to 2MB huge pages Then my program does the following things in order: 1. Write sequentially to the whole 10GB region 2. Read the first 5GB region sequentially for 10 times 3. Sleep 5s 4. Read the second 5GB region sequentially for 10 times With a proper damon setting, we are expected to see df-passed to be 10GB and hot region move around with the read $ # Start DAMON $ sudo ./damo/damo start "./my_test/test" --monitoring_intervals 100ms\ 1s 2s $ # Show DAMON-generated access pattern snapshot $ sudo ./damo/damo report access --snapshot_damos_filter allow \ hugepage_size 2MiB 2MiB heatmap: # min/max temperatures: -600,000,000, 100,001,000, column size: 137.352 MiB intervals: sample 100 ms aggr 1 s (max access hz 10) # damos filters (df): reject none hugepage_size [2.000 MiB, 2.000 MiB] df-pass: # min/max temperatures: -400,000,000, 100,001,000, column size: 128.031 MiB 0 addr 85.373 TiB size 745.555 MiB access 0 hz age 6 s df-passed 0 B 1 addr 127.608 TiB size 877.664 MiB access 3.000 hz age 0 ns df-passed 878.000 MiB 2 addr 127.609 TiB size 219.418 MiB access 2.000 hz age 0 ns df-passed 220.000 MiB 3 addr 127.609 TiB size 316.613 MiB access 1.000 hz age 1 s df-passed 316.000 MiB 4 addr 127.609 TiB size 474.922 MiB access 1.000 hz age 1 s df-passed 476.000 MiB 5 addr 127.610 TiB size 407.188 MiB access 1.000 hz age 0 ns df-passed 406.000 MiB 6 addr 127.610 TiB size 610.781 MiB access 1.000 hz age 0 ns df-passed 612.000 MiB 7 addr 127.611 TiB size 697.309 MiB access 0 hz age 0 ns df-passed 696.000 MiB 8 addr 127.611 TiB size 77.480 MiB access 1.000 hz age 0 ns df-passed 78.000 MiB 9 addr 127.611 TiB size 573.102 MiB access 1.000 hz age 0 ns df-passed 574.000 MiB 10 addr 127.612 TiB size 245.617 MiB access 2.000 hz age 0 ns df-passed 246.000 MiB 11 addr 127.612 TiB size 295.102 MiB access 1.000 hz age 1 s df-passed 294.000 MiB 12 addr 127.612 TiB size 295.105 MiB access 1.000 hz age 1 s df-passed 296.000 MiB 13 addr 127.613 TiB size 67.172 MiB access 1.000 hz age 1 s df-passed 66.000 MiB 14 addr 127.613 TiB size 604.570 MiB access 0 hz age 1 s df-passed 606.000 MiB 15 addr 127.613 TiB size 389.578 MiB access 0 hz age 4 s df-passed 388.000 MiB 16 addr 127.614 TiB size 259.719 MiB access 0 hz age 4 s df-passed 260.000 MiB 17 addr 127.614 TiB size 817.941 MiB access 0 hz age 4 s df-passed 818.000 MiB 18 addr 127.615 TiB size 204.488 MiB access 0 hz age 4 s df-passed 204.000 MiB 19 addr 127.615 TiB size 730.902 MiB access 0 hz age 4 s df-passed 732.000 MiB 20 addr 127.616 TiB size 182.727 MiB access 0 hz age 4 s df-passed 182.000 MiB 21 addr 127.616 TiB size 926.824 MiB access 0 hz age 2 s df-passed 928.000 MiB 22 addr 127.617 TiB size 102.984 MiB access 0 hz age 2 s df-passed 102.000 MiB 23 addr 127.617 TiB size 86.527 MiB access 0 hz age 2 s df-passed 86.000 MiB 24 addr 127.617 TiB size 778.777 MiB access 0 hz age 2 s df-passed 776.000 MiB 25 addr 127.999 TiB size 132.000 KiB access 0 hz age 6 s df-passed 0 B memory bw estimate: 6.524 GiB per second df-passed: 6.527 GiB per second total size: 10.731 GiB df-passed 10.000 GiB record DAMON intervals: sample 100 ms, aggr 1 s $ # Show DAMON-generated access pattern snapshot again $ sudo ./damo/damo report access --snapshot_damos_filter allow \ hugepage_size 2MiB 2MiB heatmap: # min/max temperatures: -1,100,000,000, 2,000, column size: 137.352 MiB intervals: sample 100 ms aggr 1 s (max access hz 10) # damos filters (df): reject none hugepage_size [2.000 MiB, 2.000 MiB] df-pass: # min/max temperatures: -900,000,000, 2,000, column size: 128.031 MiB 0 addr 85.373 TiB size 745.555 MiB access 0 hz age 11 s df-passed 0 B 1 addr 127.608 TiB size 579.715 MiB access 2.000 hz age 0 ns df-passed 580.000 MiB 2 addr 127.608 TiB size 144.930 MiB access 2.000 hz age 0 ns df-passed 146.000 MiB 3 addr 127.608 TiB size 452.453 MiB access 2.000 hz age 0 ns df-passed 452.000 MiB 4 addr 127.609 TiB size 113.117 MiB access 1.000 hz age 0 ns df-passed 114.000 MiB 5 addr 127.609 TiB size 182.367 MiB access 2.000 hz age 0 ns df-passed 182.000 MiB 6 addr 127.609 TiB size 182.371 MiB access 2.000 hz age 0 ns df-passed 182.000 MiB 7 addr 127.609 TiB size 350.488 MiB access 1.000 hz age 0 ns df-passed 350.000 MiB 8 addr 127.610 TiB size 525.738 MiB access 1.000 hz age 0 ns df-passed 526.000 MiB 9 addr 127.610 TiB size 401.352 MiB access 1.000 hz age 0 ns df-passed 402.000 MiB 10 addr 127.611 TiB size 100.340 MiB access 1.000 hz age 0 ns df-passed 100.000 MiB 11 addr 127.611 TiB size 19.523 MiB access 0 hz age 0 ns df-passed 20.000 MiB 12 addr 127.611 TiB size 175.727 MiB access 0 hz age 0 ns df-passed 176.000 MiB 13 addr 127.611 TiB size 106.629 MiB access 0 hz age 0 ns df-passed 106.000 MiB 14 addr 127.611 TiB size 959.676 MiB access 0 hz age 0 ns df-passed 960.000 MiB 15 addr 127.612 TiB size 424.469 MiB access 1.000 hz age 0 ns df-passed 424.000 MiB 16 addr 127.612 TiB size 424.469 MiB access 1.000 hz age 0 ns df-passed 424.000 MiB 17 addr 127.613 TiB size 201.648 MiB access 0 hz age 6 s df-passed 202.000 MiB 18 addr 127.613 TiB size 806.609 MiB access 0 hz age 6 s df-passed 806.000 MiB 19 addr 127.614 TiB size 862.125 MiB access 0 hz age 9 s df-passed 862.000 MiB 20 addr 127.614 TiB size 215.535 MiB access 0 hz age 9 s df-passed 216.000 MiB 21 addr 127.615 TiB size 104.500 MiB access 0 hz age 9 s df-passed 104.000 MiB 22 addr 127.615 TiB size 940.523 MiB access 0 hz age 9 s df-passed 942.000 MiB 23 addr 127.616 TiB size 640.281 MiB access 0 hz age 7 s df-passed 640.000 MiB 24 addr 127.616 TiB size 426.855 MiB access 0 hz age 7 s df-passed 426.000 MiB 25 addr 127.617 TiB size 90.105 MiB access 0 hz age 7 s df-passed 90.000 MiB 26 addr 127.617 TiB size 810.965 MiB access 0 hz age 7 s df-passed 808.000 MiB 27 addr 127.999 TiB size 132.000 KiB access 0 hz age 11 s df-passed 0 B memory bw estimate: 5.297 GiB per second df-passed: 5.297 GiB per second total size: 10.731 GiB df-passed 10.000 GiB record DAMON intervals: sample 100 ms, aggr 1 s As you can see the total df-passed region is 10GiB and the hot region moves as the seq read keeps going This patch (of 2): This patch moves damon_pa_scheme_has_filter to ops-common. renaming to damos_ops_has_filter. Doing so allows us to reuse its logic in the vaddr version of DAMOS_STAT. Link: https://lkml.kernel.org/r/cover.1754135312.git.pyyjason@gmail.com Link: https://lkml.kernel.org/r/cbe01740f7ac5ac7c9fd1ca367d297c3d7f2a69d.1754135312.git.pyyjason@gmail.com Signed-off-by: Yueyang Pan <pyyjason@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24mm/damon/ops-common: ignore migration request to invalid nodesSeongJae Park
damon_migrate_pages() tries migration even if the target node is invalid. If users mistakenly make such invalid requests via DAMOS_MIGRATE_{HOT,COLD} action, the below kernel BUG can happen. [ 7831.883495] BUG: unable to handle page fault for address: 0000000000001f48 [ 7831.884160] #PF: supervisor read access in kernel mode [ 7831.884681] #PF: error_code(0x0000) - not-present page [ 7831.885203] PGD 0 P4D 0 [ 7831.885468] Oops: Oops: 0000 [#1] SMP PTI [ 7831.885852] CPU: 31 UID: 0 PID: 94202 Comm: kdamond.0 Not tainted 6.16.0-rc5-mm-new-damon+ #93 PREEMPT(voluntary) [ 7831.886913] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014 [ 7831.887777] RIP: 0010:__alloc_frozen_pages_noprof (include/linux/mmzone.h:1724 include/linux/mmzone.h:1750 mm/page_alloc.c:4936 mm/page_alloc.c:5137) [...] [ 7831.895953] Call Trace: [ 7831.896195] <TASK> [ 7831.896397] __folio_alloc_noprof (mm/page_alloc.c:5183 mm/page_alloc.c:5192) [ 7831.896787] migrate_pages_batch (mm/migrate.c:1189 mm/migrate.c:1851) [ 7831.897228] ? __pfx_alloc_migration_target (mm/migrate.c:2137) [ 7831.897735] migrate_pages (mm/migrate.c:2078) [ 7831.898141] ? __pfx_alloc_migration_target (mm/migrate.c:2137) [ 7831.898664] damon_migrate_folio_list (mm/damon/ops-common.c:321 mm/damon/ops-common.c:354) [ 7831.899140] damon_migrate_pages (mm/damon/ops-common.c:405) [...] Add a target node validity check in damon_migrate_pages(). The validity check is stolen from that of do_pages_move(), which is being used for the move_pages() system call. Link: https://lkml.kernel.org/r/20250720185822.1451-1-sj@kernel.org Fixes: b51820ebea65 ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion") [6.11.x] Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Honggyu Kim <honggyu.kim@sk.com> Cc: Hyeongtak Ji <hyeongtak.ji@sk.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon: move folio filtering from paddr to ops-commonBijan Tabatabai
This patch moves damos_pa_filter_match and the functions it calls to ops-common, renaming it to damos_folio_filter_match. Doing so allows us to share the filtering logic for the vaddr version of the migrate_{hot,cold} schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-13-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/damon: move migration helpers from paddr to ops-commonBijan Tabatabai
This patch moves the damon_pa_migrate_pages function along with its corresponding helper functions from paddr to ops-common. The function prefix of "damon_pa_" was also changed to just "damon_" accordingly. This patch will allow page migration to be available to vaddr schemes as well as paddr schemes. Link: https://lkml.kernel.org/r/20250709005952.17776-9-bijan311@gmail.com Co-developed-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Ravi Shankar Jonnalagadda <ravis.opensrc@micron.com> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-05mm/damon: s/primitives/code/ on commentsEnze Li
The word 'primitive' is not explicit. To make the code more easily understood, this commit renames 'primitives' to 'code' in header comments of some source files. Link: https://lkml.kernel.org/r/20250530053115.153238-1-lienze@kylinos.cn Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/ops: have damon_get_folio return folio even for tail pagesUsama Arif
Patch series "mm/damon/paddr: fix large folios access and schemes handling". DAMON operations set for physical address space, namely 'paddr', treats tail pages as unaccessed always. It can also apply DAMOS action to a large folio multiple times within single DAMOS' regions walking. As a result, the monitoring output has poor quality and DAMOS works in unexpected ways when large folios are being used. Fix those. The patches were parts of Usama's hugepage_size DAMOS filter patch series[1]. The first fix has collected from there with a slight commit message change for the subject prefix. The second fix is re-written by SJ and posted as an RFC before this series. The second one also got a slight commit message change for the subject prefix. [1] https://lore.kernel.org/20250203225604.44742-1-usamaarif642@gmail.com [2] https://lore.kernel.org/20250206231103.38298-1-sj@kernel.org This patch (of 2): This effectively adds support for large folios in damon for paddr, as damon_pa_mkold/young won't get a null folio from this function and won't ignore it, hence access will be checked and reported. This also means that larger folios will be considered for different DAMOS actions like pageout, prioritization and migration. As these DAMOS actions will consider larger folios, iterate through the region at folio_size and not PAGE_SIZE intervals. This should not have an affect on vaddr, as damon_young_pmd_entry considers pmd entries. Link: https://lkml.kernel.org/r/20250207212033.45269-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250207212033.45269-2-sj@kernel.org Fixes: a28397beb55b ("mm/damon: implement primitives for physical address space monitoring") Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: handle device-exclusive entries correctly in damon_folio_mkold_one()David Hildenbrand
Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_mkold_one() is not prepared for that and calls damon_ptep_mkold() with PFN swap PTEs. Teach damon_ptep_mkold() to deal with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as damon_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle it correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Link: https://lkml.kernel.org/r/20250210193801.781278-16-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: Alistair Popple <apopple@nvidia.com> Cc: Alex Shi <alexs@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Karol Herbst <kherbst@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lyude <lyude@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm/damon/ops-common: avoid divide-by-zero during region hotness calculationSeongJae Park
When calculating the hotness of each region for the under-quota regions prioritization, DAMON divides some values by the maximum nr_accesses. However, due to the type of the related variables, simple division-based calculation of the divisor can return zero. As a result, divide-by-zero is possible. Fix it by using damon_max_nr_accesses(), which handles the case. Link: https://lkml.kernel.org/r/20231019194924.100347-4-sj@kernel.org Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21damon: use pmdp_get instead of drectly dereferencing pmdLevi Yun
As ptep_get, Use the pmdp_get wrapper when we accessing pmdval instead of directly dereferencing pmd. Link: https://lkml.kernel.org/r/20230727212157.2985025-1-ppbuk5246@gmail.com Signed-off-by: Levi Yun <ppbuk5246@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mm: ptep_get() conversionRyan Roberts
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09mm/damon/ops-common: refactor to use {pte|pmd}p_clear_young_notify()Ryan Roberts
With the fix in place to atomically test and clear young on ptes and pmds, simplify the code to handle the clearing for both the primary mmu and the mmu notifier with a single API call. Link: https://lkml.kernel.org/r/20230602092949.545577-4-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Yu Zhao <yuzhao@google.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09mm/damon/ops-common: atomically test and clear young on ptes and pmdsRyan Roberts
It is racy to non-atomically read a pte, then clear the young bit, then write it back as this could discard dirty information. Further, it is bad practice to directly set a pte entry within a table. Instead clearing young must go through the arch-provided helper, ptep_test_and_clear_young() to ensure it is modified atomically and to give the arch code visibility and allow it to check (and potentially modify) the operation. Link: https://lkml.kernel.org/r/20230602092949.545577-3-ryan.roberts@arm.com Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces"). Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon: convert damon_ptep/pmdp_mkold() to use a folioKefeng Wang
With damon_get_folio(), let's convert damon_ptep_mkold() and damon_pmdp_mkold() to use a folio. Link: https://lkml.kernel.org/r/20221230070849.63358-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18mm/damon: introduce damon_get_folio()Kefeng Wang
Introduce damon_get_folio(), and the temporary wrapper function damon_get_page(), which help us to convert damon related functions to use folios, and it will be dropped once the conversion is completed. Link: https://lkml.kernel.org/r/20221230070849.63358-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon: rename damon_pageout_score() to damon_cold_score()Kaixu Xia
In the beginning there is only one damos_action 'DAMOS_PAGEOUT' that need to get the coldness score of a region for a scheme, which using damon_pageout_score() to do that. But now there are also other damos_action actions need the coldness score, so rename it to damon_cold_score() to make more sense. Link: https://lkml.kernel.org/r/1663423014-28907-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03mm/damon/core: use a dedicated struct for monitoring attributesSeongJae Park
DAMON monitoring attributes are directly defined as fields of 'struct damon_ctx'. This makes 'struct damon_ctx' a little long and complicated. This commit defines and uses a struct, 'struct damon_attrs', which is dedicated for only the monitoring attributes to make the purpose of the five values clearer and simplify 'struct damon_ctx'. Link: https://lkml.kernel.org/r/20220913174449.50645-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11mm/damon: get the hotness from damon_hot_score() in damon_pageout_score()Kaixu Xia
We can get the hotness value from damon_hot_score() directly in damon_pageout_score() function and improve the code readability. Link: https://lkml.kernel.org/r/1661766366-20998-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03mm/damon/schemes: add 'LRU_PRIO' DAMOS actionSeongJae Park
This commit adds a new DAMOS action called 'LRU_PRIO' for the physical address space. The action prioritizes pages in the memory regions of the user-specified target access pattern on their LRU lists. This is hence supposed to be used for frequently accessed (hot) memory regions so that hot pages could be more likely protected under memory pressure. Internally, it simply calls 'mark_page_accessed()'. Link: https://lkml.kernel.org/r/20220613192301.8817-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19mm: damon: use HPAGE_PMD_SIZEKefeng Wang
Use HPAGE_PMD_SIZE instead of open coding. Link: https://lkml.kernel.org/r/20220517145120.118523-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-03-22mm/damon: rename damon_primitives to damon_operationsSeongJae Park
Patch series "Allow DAMON user code independent of monitoring primitives". In-kernel DAMON user code is required to configure the monitoring context (struct damon_ctx) with proper monitoring primitives (struct damon_primitive). This makes the user code dependent to all supporting monitoring primitives. For example, DAMON debugfs interface depends on both DAMON_VADDR and DAMON_PADDR, though some users have interest in only one use case. As more monitoring primitives are introduced, the problem will be bigger. To minimize such unnecessary dependency, this patchset makes monitoring primitives can be registered by the implemnting code and later dynamically searched and selected by the user code. In addition to that, this patchset renames monitoring primitives to monitoring operations, which is more easy to intuitively understand what it means and how it would be structed. This patch (of 8): DAMON has a set of callback functions called monitoring primitives and let it can be configured with various implementations for easy extension for different address spaces and usages. However, the word 'primitive' is not so explicit. Meanwhile, many other structs resembles similar purpose calls themselves 'operations'. To make the code easier to be understood, this commit renames 'damon_primitives' to 'damon_operations' before it is too late to rename. Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Xin Hao <xhao@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>