sched/eevdf: Fix min_vruntime vs avg_vruntime

Basically, from the constraint that the sum of lag is zero, you can infer that the 0-lag point is the weighted average of the individual vruntime, which is what we're trying to compute: \Sum w_i * v_i avg = -------------- \Sum w_i Now, since vruntime takes the whole u64 (worse, it wraps), this multiplication term in the numerator is not something we can compute; instead we do the min_vruntime (v0 henceforth) thing like: v_i = (v_i - v0) + v0 This does two things: - it keeps the key: (v_i - v0) 'small'; - it creates a relative 0-point in the modular space. If you do that subtitution and work it all out, you end up with: \Sum w_i * (v_i - v0) avg = --------------------- + v0 \Sum w_i Since you cannot very well track a ratio like that (and not suffer terrible numerical problems) we simpy track the numerator and denominator individually and only perform the division when strictly needed. Notably, the numerator lives in cfs_rq->avg_vruntime and the denominator lives in cfs_rq->avg_load. The one extra 'funny' is that these numbers track the entities in the tree, and current is typically outside of the tree, so avg_vruntime() adds current when needed before doing the division. (vruntime_eligible() elides the division by cross-wise multiplication) Anyway, as mentioned above, we currently use the CFS era min_vruntime for this purpose. However, this thing can only move forward, while the above avg can in fact move backward (when a non-eligible task leaves, the average becomes smaller), this can cause trouble when through happenstance (or construction) these values drift far enough apart to wreck the game. Replace cfs_rq::min_vruntime with cfs_rq::zero_vruntime which is kept near/at avg_vruntime, following its motion. The down-side is that this requires computing the avg more often. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy") Reported-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251106111741.GC4068168@noisy.programming.kicks-ass.net Cc: stable@vger.kernel.org
author: Peter Zijlstra <peterz@infradead.org> 2025-04-02 20:07:34 +0200
committer: Peter Zijlstra <peterz@infradead.org> 2025-11-11 12:33:38 +0100
commit: 79f3f9bedd149ea438aaeb0fb6a083637affe205 (patch)
tree: a134ca6b2126cdbfeb9c1ad33deb25cbc7d8ffe6 /kernel/sched/sched.h
parent: 9359d9785d85bb53f1ff1738a59aeeec4b878906 (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 82e74e8ca2ea..5a3cf81c27be 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -681,10 +681,10 @@ struct cfs_rq {
 	s64			avg_vruntime;
 	u64			avg_load;
 
-	u64			min_vruntime;
+	u64			zero_vruntime;
 #ifdef CONFIG_SCHED_CORE
 	unsigned int		forceidle_seq;
-	u64			min_vruntime_fi;
+	u64			zero_vruntime_fi;
 #endif
 
 	struct rb_root_cached	tasks_timeline;
author	Peter Zijlstra <peterz@infradead.org>	2025-04-02 20:07:34 +0200
committer	Peter Zijlstra <peterz@infradead.org>	2025-11-11 12:33:38 +0100
commit	79f3f9bedd149ea438aaeb0fb6a083637affe205 (patch)
tree	a134ca6b2126cdbfeb9c1ad33deb25cbc7d8ffe6 /kernel/sched/sched.h
parent	9359d9785d85bb53f1ff1738a59aeeec4b878906 (diff)