diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-12-03 13:04:07 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-12-03 13:04:07 -0800 |
| commit | 8449d3252c2603a51ffc7c36cb5bd94874378b7d (patch) | |
| tree | e834b0c0569532e33e622a6966ae67632d2cab66 /Documentation/admin-guide | |
| parent | 2b60145734a0e5a4b73952a540928d2c4f4fed64 (diff) | |
| parent | b1bcaed1e39a9e0dfbe324a15d2ca4253deda316 (diff) | |
Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
Diffstat (limited to 'Documentation/admin-guide')
| -rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 31 |
1 files changed, 25 insertions, 6 deletions
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 0e6c67ac585a..4c072e85acdf 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -53,7 +53,8 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou 5-2. Memory 5-2-1. Memory Interface Files 5-2-2. Usage Guidelines - 5-2-3. Memory Ownership + 5-2-3. Reclaim Protection + 5-2-4. Memory Ownership 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback @@ -1317,7 +1318,7 @@ PAGE_SIZE multiple when read back. smaller overages. Effective min boundary is limited by memory.min values of - all ancestor cgroups. If there is memory.min overcommitment + ancestor cgroups. If there is memory.min overcommitment (child cgroup or cgroups are requiring more protected memory than parent will allow), then each child cgroup will get the part of parent's protection proportional to its @@ -1326,9 +1327,6 @@ PAGE_SIZE multiple when read back. Putting more memory than generally available under this protection is discouraged and may lead to constant OOMs. - If a memory cgroup is not populated with processes, - its memory.min is ignored. - memory.low A read-write single value file which exists on non-root cgroups. The default is "0". @@ -1343,7 +1341,7 @@ PAGE_SIZE multiple when read back. smaller overages. Effective low boundary is limited by memory.low values of - all ancestor cgroups. If there is memory.low overcommitment + ancestor cgroups. If there is memory.low overcommitment (child cgroup or cgroups are requiring more protected memory than parent will allow), then each child cgroup will get the part of parent's protection proportional to its @@ -1934,6 +1932,27 @@ memory - is necessary to determine whether a workload needs more memory; unfortunately, memory pressure monitoring mechanism isn't implemented yet. +Reclaim Protection +~~~~~~~~~~~~~~~~~~ + +The protection configured with "memory.low" or "memory.min" applies relatively +to the target of the reclaim (i.e. any of memory cgroup limits, proactive +memory.reclaim or global reclaim apparently located in the root cgroup). +The protection value configured for B applies unchanged to the reclaim +targeting A (i.e. caused by competition with the sibling E):: + + root - ... - A - B - C + \ ` D + ` E + +When the reclaim targets ancestors of A, the effective protection of B is +capped by the protection value configured for A (and any other intermediate +ancestors between A and the target). + +To express indifference about relative sibling protection, it is suggested to +use memory_recursiveprot. Configuring all descendants of a parent with finite +protection to "max" works but it may unnecessarily skew memory.events:low +field. Memory Ownership ~~~~~~~~~~~~~~~~ |