summaryrefslogtreecommitdiff
path: root/Documentation/core-api/real-time/theory.rst
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2025-10-03 17:16:13 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2025-10-03 17:16:13 -0700
commitee2fe81cdcd17f875aeca074afe64d7e8f57750f (patch)
tree5f0f93208c8f9f71edb2b6ef0593125419725aa7 /Documentation/core-api/real-time/theory.rst
parent50647a1176b7abd1b4ae55b491eb2fbbeef89db9 (diff)
parent99510c324e531addd9f7b80a72dab7435ca66215 (diff)
Merge tag 'docs-6.18' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "It has been a relatively busy cycle in docsland, with changes all over: - Bring the kernel memory-model docs into the Sphinx build in the "literal include" mode. - Lots of build-infrastructure work, further cleaning up long-term kernel-doc technical debt. The sphinx-pre-install tool has been converted to Python and updated for current systems. - A new tool to detect when documents have been moved and generate HTML redirects; this can be used on kernel.org (or any other site hosting the rendered docs) to avoid breaking links. - Automated processing of the YAML files describing the netlink protocol. - A significant update of the maintainer's PGP guide. ... and a seemingly endless series of typo fixes, build-problem fixes, etc" * tag 'docs-6.18' of git://git.lwn.net/linux: (193 commits) Documentation/features: Update feature lists for 6.17-rc7 docs: remove cdomain.py Documentation/process: submitting-patches: fix typo in "were do" docs: dev-tools/lkmm: Fix typo of missing file extension Documentation: trace: histogram: Convert ftrace docs cross-reference Documentation: trace: histogram-design: Wrap introductory note in note:: directive Documentation: trace: historgram-design: Separate sched_waking histogram section heading and the following diagram Documentation: trace: histogram-design: Trim trailing vertices in diagram explanation text Documentation: trace: histogram: Fix histogram trigger subsection number order docs: driver-api: fix spelling of "buses". Documentation: fbcon: Use admonition directives Documentation: fbcon: Reindent 8th step of attach/detach/unload Documentation: fbcon: Add boot options and attach/detach/unload section headings docs: filesystems: sysfs: add remaining top level sysfs directory descriptions docs: filesystems: sysfs: clarify symlink destinations in dev and bus/devices descriptions docs: filesystems: sysfs: remove top level sysfs net directory docs: maintainer: Fix ambiguous subheading formatting docs: kdoc: a few more dump_typedef() tweaks docs: kdoc: remove redundant comment stripping in dump_typedef() docs: kdoc: remove some dead code in dump_typedef() ...
Diffstat (limited to 'Documentation/core-api/real-time/theory.rst')
-rw-r--r--Documentation/core-api/real-time/theory.rst116
1 files changed, 116 insertions, 0 deletions
diff --git a/Documentation/core-api/real-time/theory.rst b/Documentation/core-api/real-time/theory.rst
new file mode 100644
index 000000000000..43d0120737f8
--- /dev/null
+++ b/Documentation/core-api/real-time/theory.rst
@@ -0,0 +1,116 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Theory of operation
+=====================
+
+:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+
+Preface
+=======
+
+PREEMPT_RT transforms the Linux kernel into a real-time kernel. It achieves
+this by replacing locking primitives, such as spinlock_t, with a preemptible
+and priority-inheritance aware implementation known as rtmutex, and by enforcing
+the use of threaded interrupts. As a result, the kernel becomes fully
+preemptible, with the exception of a few critical code paths, including entry
+code, the scheduler, and low-level interrupt handling routines.
+
+This transformation places the majority of kernel execution contexts under the
+control of the scheduler and significantly increasing the number of preemption
+points. Consequently, it reduces the latency between a high-priority task
+becoming runnable and its actual execution on the CPU.
+
+Scheduling
+==========
+
+The core principles of Linux scheduling and the associated user-space API are
+documented in the man page sched(7)
+`sched(7) <https://man7.org/linux/man-pages/man7/sched.7.html>`_.
+By default, the Linux kernel uses the SCHED_OTHER scheduling policy. Under
+this policy, a task is preempted when the scheduler determines that it has
+consumed a fair share of CPU time relative to other runnable tasks. However,
+the policy does not guarantee immediate preemption when a new SCHED_OTHER task
+becomes runnable. The currently running task may continue executing.
+
+This behavior differs from that of real-time scheduling policies such as
+SCHED_FIFO. When a task with a real-time policy becomes runnable, the
+scheduler immediately selects it for execution if it has a higher priority than
+the currently running task. The task continues to run until it voluntarily
+yields the CPU, typically by blocking on an event.
+
+Sleeping spin locks
+===================
+
+The various lock types and their behavior under real-time configurations are
+described in detail in Documentation/locking/locktypes.rst.
+In a non-PREEMPT_RT configuration, a spinlock_t is acquired by first disabling
+preemption and then actively spinning until the lock becomes available. Once
+the lock is released, preemption is enabled. From a real-time perspective,
+this approach is undesirable because disabling preemption prevents the
+scheduler from switching to a higher-priority task, potentially increasing
+latency.
+
+To address this, PREEMPT_RT replaces spinning locks with sleeping spin locks
+that do not disable preemption. On PREEMPT_RT, spinlock_t is implemented using
+rtmutex. Instead of spinning, a task attempting to acquire a contended lock
+disables CPU migration, donates its priority to the lock owner (priority
+inheritance), and voluntarily schedules out while waiting for the lock to
+become available.
+
+Disabling CPU migration provides the same effect as disabling preemption, while
+still allowing preemption and ensuring that the task continues to run on the
+same CPU while holding a sleeping lock.
+
+Priority inheritance
+====================
+
+Lock types such as spinlock_t and mutex_t in a PREEMPT_RT enabled kernel are
+implemented on top of rtmutex, which provides support for priority inheritance
+(PI). When a task blocks on such a lock, the PI mechanism temporarily
+propagates the blocked task’s scheduling parameters to the lock owner.
+
+For example, if a SCHED_FIFO task A blocks on a lock currently held by a
+SCHED_OTHER task B, task A’s scheduling policy and priority are temporarily
+inherited by task B. After this inheritance, task A is put to sleep while
+waiting for the lock, and task B effectively becomes the highest-priority task
+in the system. This allows B to continue executing, make progress, and
+eventually release the lock.
+
+Once B releases the lock, it reverts to its original scheduling parameters, and
+task A can resume execution.
+
+Threaded interrupts
+===================
+
+Interrupt handlers are another source of code that executes with preemption
+disabled and outside the control of the scheduler. To bring interrupt handling
+under scheduler control, PREEMPT_RT enforces threaded interrupt handlers.
+
+With forced threading, interrupt handling is split into two stages. The first
+stage, the primary handler, is executed in IRQ context with interrupts disabled.
+Its sole responsibility is to wake the associated threaded handler. The second
+stage, the threaded handler, is the function passed to request_irq() as the
+interrupt handler. It runs in process context, scheduled by the kernel.
+
+From waking the interrupt thread until threaded handling is completed, the
+interrupt source is masked in the interrupt controller. This ensures that the
+device interrupt remains pending but does not retrigger the CPU, allowing the
+system to exit IRQ context and handle the interrupt in a scheduled thread.
+
+By default, the threaded handler executes with the SCHED_FIFO scheduling policy
+and a priority of 50 (MAX_RT_PRIO / 2), which is midway between the minimum and
+maximum real-time priorities.
+
+If the threaded interrupt handler raises any soft interrupts during its
+execution, those soft interrupt routines are invoked after the threaded handler
+completes, within the same thread. Preemption remains enabled during the
+execution of the soft interrupt handler.
+
+Summary
+=======
+
+By using sleeping locks and forced-threaded interrupts, PREEMPT_RT
+significantly reduces sections of code where interrupts or preemption is
+disabled, allowing the scheduler to preempt the current execution context and
+switch to a higher-priority task.