diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-10-03 17:16:13 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-10-03 17:16:13 -0700 |
| commit | ee2fe81cdcd17f875aeca074afe64d7e8f57750f (patch) | |
| tree | 5f0f93208c8f9f71edb2b6ef0593125419725aa7 /Documentation/core-api/real-time/theory.rst | |
| parent | 50647a1176b7abd1b4ae55b491eb2fbbeef89db9 (diff) | |
| parent | 99510c324e531addd9f7b80a72dab7435ca66215 (diff) | |
Merge tag 'docs-6.18' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle in docsland, with changes all
over:
- Bring the kernel memory-model docs into the Sphinx build in the
"literal include" mode.
- Lots of build-infrastructure work, further cleaning up long-term
kernel-doc technical debt. The sphinx-pre-install tool has been
converted to Python and updated for current systems.
- A new tool to detect when documents have been moved and generate
HTML redirects; this can be used on kernel.org (or any other site
hosting the rendered docs) to avoid breaking links.
- Automated processing of the YAML files describing the netlink
protocol.
- A significant update of the maintainer's PGP guide.
... and a seemingly endless series of typo fixes, build-problem fixes,
etc"
* tag 'docs-6.18' of git://git.lwn.net/linux: (193 commits)
Documentation/features: Update feature lists for 6.17-rc7
docs: remove cdomain.py
Documentation/process: submitting-patches: fix typo in "were do"
docs: dev-tools/lkmm: Fix typo of missing file extension
Documentation: trace: histogram: Convert ftrace docs cross-reference
Documentation: trace: histogram-design: Wrap introductory note in note:: directive
Documentation: trace: historgram-design: Separate sched_waking histogram section heading and the following diagram
Documentation: trace: histogram-design: Trim trailing vertices in diagram explanation text
Documentation: trace: histogram: Fix histogram trigger subsection number order
docs: driver-api: fix spelling of "buses".
Documentation: fbcon: Use admonition directives
Documentation: fbcon: Reindent 8th step of attach/detach/unload
Documentation: fbcon: Add boot options and attach/detach/unload section headings
docs: filesystems: sysfs: add remaining top level sysfs directory descriptions
docs: filesystems: sysfs: clarify symlink destinations in dev and bus/devices descriptions
docs: filesystems: sysfs: remove top level sysfs net directory
docs: maintainer: Fix ambiguous subheading formatting
docs: kdoc: a few more dump_typedef() tweaks
docs: kdoc: remove redundant comment stripping in dump_typedef()
docs: kdoc: remove some dead code in dump_typedef()
...
Diffstat (limited to 'Documentation/core-api/real-time/theory.rst')
| -rw-r--r-- | Documentation/core-api/real-time/theory.rst | 116 |
1 files changed, 116 insertions, 0 deletions
diff --git a/Documentation/core-api/real-time/theory.rst b/Documentation/core-api/real-time/theory.rst new file mode 100644 index 000000000000..43d0120737f8 --- /dev/null +++ b/Documentation/core-api/real-time/theory.rst @@ -0,0 +1,116 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Theory of operation +===================== + +:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> + +Preface +======= + +PREEMPT_RT transforms the Linux kernel into a real-time kernel. It achieves +this by replacing locking primitives, such as spinlock_t, with a preemptible +and priority-inheritance aware implementation known as rtmutex, and by enforcing +the use of threaded interrupts. As a result, the kernel becomes fully +preemptible, with the exception of a few critical code paths, including entry +code, the scheduler, and low-level interrupt handling routines. + +This transformation places the majority of kernel execution contexts under the +control of the scheduler and significantly increasing the number of preemption +points. Consequently, it reduces the latency between a high-priority task +becoming runnable and its actual execution on the CPU. + +Scheduling +========== + +The core principles of Linux scheduling and the associated user-space API are +documented in the man page sched(7) +`sched(7) <https://man7.org/linux/man-pages/man7/sched.7.html>`_. +By default, the Linux kernel uses the SCHED_OTHER scheduling policy. Under +this policy, a task is preempted when the scheduler determines that it has +consumed a fair share of CPU time relative to other runnable tasks. However, +the policy does not guarantee immediate preemption when a new SCHED_OTHER task +becomes runnable. The currently running task may continue executing. + +This behavior differs from that of real-time scheduling policies such as +SCHED_FIFO. When a task with a real-time policy becomes runnable, the +scheduler immediately selects it for execution if it has a higher priority than +the currently running task. The task continues to run until it voluntarily +yields the CPU, typically by blocking on an event. + +Sleeping spin locks +=================== + +The various lock types and their behavior under real-time configurations are +described in detail in Documentation/locking/locktypes.rst. +In a non-PREEMPT_RT configuration, a spinlock_t is acquired by first disabling +preemption and then actively spinning until the lock becomes available. Once +the lock is released, preemption is enabled. From a real-time perspective, +this approach is undesirable because disabling preemption prevents the +scheduler from switching to a higher-priority task, potentially increasing +latency. + +To address this, PREEMPT_RT replaces spinning locks with sleeping spin locks +that do not disable preemption. On PREEMPT_RT, spinlock_t is implemented using +rtmutex. Instead of spinning, a task attempting to acquire a contended lock +disables CPU migration, donates its priority to the lock owner (priority +inheritance), and voluntarily schedules out while waiting for the lock to +become available. + +Disabling CPU migration provides the same effect as disabling preemption, while +still allowing preemption and ensuring that the task continues to run on the +same CPU while holding a sleeping lock. + +Priority inheritance +==================== + +Lock types such as spinlock_t and mutex_t in a PREEMPT_RT enabled kernel are +implemented on top of rtmutex, which provides support for priority inheritance +(PI). When a task blocks on such a lock, the PI mechanism temporarily +propagates the blocked task’s scheduling parameters to the lock owner. + +For example, if a SCHED_FIFO task A blocks on a lock currently held by a +SCHED_OTHER task B, task A’s scheduling policy and priority are temporarily +inherited by task B. After this inheritance, task A is put to sleep while +waiting for the lock, and task B effectively becomes the highest-priority task +in the system. This allows B to continue executing, make progress, and +eventually release the lock. + +Once B releases the lock, it reverts to its original scheduling parameters, and +task A can resume execution. + +Threaded interrupts +=================== + +Interrupt handlers are another source of code that executes with preemption +disabled and outside the control of the scheduler. To bring interrupt handling +under scheduler control, PREEMPT_RT enforces threaded interrupt handlers. + +With forced threading, interrupt handling is split into two stages. The first +stage, the primary handler, is executed in IRQ context with interrupts disabled. +Its sole responsibility is to wake the associated threaded handler. The second +stage, the threaded handler, is the function passed to request_irq() as the +interrupt handler. It runs in process context, scheduled by the kernel. + +From waking the interrupt thread until threaded handling is completed, the +interrupt source is masked in the interrupt controller. This ensures that the +device interrupt remains pending but does not retrigger the CPU, allowing the +system to exit IRQ context and handle the interrupt in a scheduled thread. + +By default, the threaded handler executes with the SCHED_FIFO scheduling policy +and a priority of 50 (MAX_RT_PRIO / 2), which is midway between the minimum and +maximum real-time priorities. + +If the threaded interrupt handler raises any soft interrupts during its +execution, those soft interrupt routines are invoked after the threaded handler +completes, within the same thread. Preemption remains enabled during the +execution of the soft interrupt handler. + +Summary +======= + +By using sleeping locks and forced-threaded interrupts, PREEMPT_RT +significantly reduces sections of code where interrupts or preemption is +disabled, allowing the scheduler to preempt the current execution context and +switch to a higher-priority task. |