sched_ext: Hook up hardlockup detector

A poorly behaving BPF scheduler can trigger hard lockup. For example, on a large system with many tasks pinned to different subsets of CPUs, if the BPF scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock can be contended to the point where hardlockup triggers. Unfortunately, hardlockup can be the first signal out of such situations, thus requiring hardlockup handling. Hook scx_hardlockup() into the hardlockup detector to try kicking out the current scheduler in an attempt to recover the system to a good state. The handling strategy can delay watchdog taking its own action by one polling period; however, given that the only remediation for hardlockup is crash, this is likely an acceptable trade-off. v2: Add missing dummy scx_hardlockup() definition for !CONFIG_SCHED_CLASS_EXT (kernel test bot). Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com> Cc: Emil Tsalapatis <etsal@meta.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
author: Tejun Heo <tj@kernel.org> 2025-11-11 09:18:12 -1000
committer: Tejun Heo <tj@kernel.org> 2025-11-12 06:43:44 -1000
commit: 582f700e1bdc5978f41e3d8d65d3e16e34e9be8a (patch)
tree: 36c1c48775642fc37e0a52d81edf021dfc41bdaa /kernel
parent: 7ed8df0d15022fcc092e7c7f0bd82359476cff3c (diff)
2 files changed, 27 insertions, 0 deletions
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 85bb052459ec..b5c87a03f112 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3712,6 +3712,24 @@ void scx_softlockup(u32 dur_s)
 }
 
 /**
+ * scx_hardlockup - sched_ext hardlockup handler
+ *
+ * A poorly behaving BPF scheduler can trigger hard lockup by e.g. putting
+ * numerous affinitized tasks in a single queue and directing all CPUs at it.
+ * Try kicking out the current scheduler in an attempt to recover the system to
+ * a good state before taking more drastic actions.
+ */
+bool scx_hardlockup(void)
+{
+	if (!handle_lockup("hard lockup - CPU %d", smp_processor_id()))
+		return false;
+
+	printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
+			smp_processor_id());
+	return true;
+}
+
+/**
  * scx_bypass - [Un]bypass scx_ops and guarantee forward progress
  * @bypass: true for bypass, false for unbypass
  *
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5b62d1002783..8dfac4a8f587 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 #ifdef CONFIG_SYSFS
 		++hardlockup_count;
 #endif
+		/*
+		 * A poorly behaving BPF scheduler can trigger hard lockup by
+		 * e.g. putting numerous affinitized tasks in a single queue and
+		 * directing all CPUs at it. The following call can return true
+		 * only once when sched_ext is enabled and will immediately
+		 * abort the BPF scheduler and print out a warning message.
+		 */
+		if (scx_hardlockup())
+			return;
 
 		/* Only print hardlockups once. */
 		if (per_cpu(watchdog_hardlockup_warned, cpu))
author	Tejun Heo <tj@kernel.org>	2025-11-11 09:18:12 -1000
committer	Tejun Heo <tj@kernel.org>	2025-11-12 06:43:44 -1000
commit	582f700e1bdc5978f41e3d8d65d3e16e34e9be8a (patch)
tree	36c1c48775642fc37e0a52d81edf021dfc41bdaa /kernel
parent	7ed8df0d15022fcc092e7c7f0bd82359476cff3c (diff)