mm/vmscan: skip increasing kswapd_failures when reclaim was boosted

We have a colocation cluster used for deploying both offline and online services simultaneously. In this environment, we encountered a scenario where direct memory reclamation was triggered due to kswapd not running. 1. When applications start up, rapidly consume memory, or experience network traffic bursts, the kernel reaches steal_suitable_fallback(), which sets watermark_boost and subsequently wakes kswapd. 2. In the core logic of kswapd thread (balance_pgdat()), when reclaim is triggered by watermark_boost, the maximum priority is 10. Higher priority values mean less aggressive LRU scanning, which can result in no pages being reclaimed during a single scan cycle: if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2) raise_priority = false; 3. Additionally, many of our pods are configured with memory.low, which prevents memory reclamation in certain cgroups, further increasing the chance of failing to reclaim memory. 4. This eventually causes pgdat->kswapd_failures to continuously accumulate, exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working. At this point, the system's available memory is still significantly above the high watermark -- it's inappropriate for kswapd to stop under these conditions. The final observable issue is that a brief period of rapid memory allocation causes kswapd to stop running, ultimately triggering direct reclaim and making the applications unresponsive. This problem leading to direct memory reclamation has been a long-standing issue in our production environment. We initially held the simple assumption that it was caused by applications allocating memory too rapidly for kswapd to keep up with reclamation. However, after we began monitoring kswapd's runtime behavior, we discovered a different pattern: kswapd initially exhibits very aggressive activity even when there is still considerable free memory, but it subsequently stops running entirely, even as memory levels approach the low watermark. In summary, both boosted watermarks and memory.low increase the probability of kswapd operation failures. This patch specifically addresses the scenario involving boosted watermarks by not incrementing kswapd_failures when reclamation fails. A more general solution, potentially addressing memory.low or other cases, requires further discussion. Link: https://lkml.kernel.org/r/53de0b3ee0b822418e909db29bfa6513faff9d36@linux.dev Link: https://lkml.kernel.org/r/20251024022711.382238-1-jiayuan.chen@linux.dev Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
author: Jiayuan Chen <jiayuan.chen@linux.dev> 2025-10-24 10:27:11 +0800
committer: Andrew Morton <akpm@linux-foundation.org> 2025-11-29 10:41:07 -0800
commit: 3cf41edc2067de9265f9f58b905317723c59a0c7 (patch)
tree: 20d84ea4cbd3839bf44aff23f247ff0406d60159
parent: 84a8d467cc426eb3c9eb34092423dcc54493dd7e (diff)
1 files changed, 6 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 720772baf2a7..92980b072121 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7127,7 +7127,12 @@ restart:
 		goto restart;
 	}
 
-	if (!sc.nr_reclaimed)
+	/*
+	 * If the reclaim was boosted, we might still be far from the
+	 * watermark_high at this point. We need to avoid increasing the
+	 * failure count to prevent the kswapd thread from stopping.
+	 */
+	if (!sc.nr_reclaimed && !boosted)
 		atomic_inc(&pgdat->kswapd_failures);
 
 out:
author	Jiayuan Chen <jiayuan.chen@linux.dev>	2025-10-24 10:27:11 +0800
committer	Andrew Morton <akpm@linux-foundation.org>	2025-11-29 10:41:07 -0800
commit	3cf41edc2067de9265f9f58b905317723c59a0c7 (patch)
tree	20d84ea4cbd3839bf44aff23f247ff0406d60159
parent	84a8d467cc426eb3c9eb34092423dcc54493dd7e (diff)