diff options
| author | Gabriele Monaco <gmonaco@redhat.com> | 2025-07-28 15:50:20 +0200 |
|---|---|---|
| committer | Steven Rostedt (Google) <rostedt@goodmis.org> | 2025-07-28 16:47:34 -0400 |
| commit | e8440a88e56bb3aa24c384eec6de8bef1184bed2 (patch) | |
| tree | b4cdcedbe47e9b8347f97d417ef5fb1a288edb08 /Documentation/trace | |
| parent | d0096c2f9cfcb4ce385698491604610fcc1a53b3 (diff) | |
rv: Add nrp and sssw per-task monitors
Add 2 per-task monitors as part of the sched model:
* nrp: need-resched preempts
Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
Monitor to ensure sched_set_state to sleepable leads to sleeping and
sleeping tasks require wakeup.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Diffstat (limited to 'Documentation/trace')
| -rw-r--r-- | Documentation/trace/rv/monitor_sched.rst | 167 |
1 files changed, 167 insertions, 0 deletions
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst index 6c4c00216c07..11ef963cb578 100644 --- a/Documentation/trace/rv/monitor_sched.rst +++ b/Documentation/trace/rv/monitor_sched.rst @@ -174,6 +174,173 @@ running one, no real task switch occurs but interrupts are disabled nonetheless: | | irq_entry +---------------+ irq_enable +Monitor nrp +----------- + +The need resched preempts (nrp) monitor ensures preemption requires +``need_resched``. Only kernel preemption is considered, since preemption +while returning to userspace, for this monitor, is indistinguishable from +``sched_switch_yield`` (described in the sssw monitor). +A kernel preemption is whenever ``__schedule`` is called with the preemption +flag set to true (e.g. from preempt_enable or exiting from interrupts). This +type of preemption occurs after the need for ``rescheduling`` has been set. +This is not valid for the *lazy* variant of the flag, which causes only +userspace preemption. +A ``schedule_entry_preempt`` may involve a task switch or not, in the latter +case, a task goes through the scheduler from a preemption context but it is +picked as the next task to run. Since the scheduler runs, this clears the need +to reschedule. The ``any_thread_running`` state does not imply the monitored +task is not running as this monitor does not track the outcome of scheduling. + +In theory, a preemption can only occur after the ``need_resched`` flag is set. In +practice, however, it is possible to see a preemption where the flag is not +set. This can happen in one specific condition:: + + need_resched + preempt_schedule() + preempt_schedule_irq() + __schedule() + !need_resched + __schedule() + +In the situation above, standard preemption starts (e.g. from preempt_enable +when the flag is set), an interrupt occurs before scheduling and, on its exit +path, it schedules, which clears the ``need_resched`` flag. +When the preempted task runs again, the standard preemption started earlier +resumes, although the flag is no longer set. The monitor considers this a +``nested_preemption``, this allows another preemption without re-setting the +flag. This condition relaxes the monitor constraints and may catch false +negatives (i.e. no real ``nested_preemptions``) but makes the monitor more +robust and able to validate other scenarios. +For simplicity, the monitor starts in ``preempt_irq``, although no interrupt +occurred, as the situation above is hard to pinpoint:: + + schedule_entry + irq_entry #===========================================# + +-------------------------- H H + | H H + +-------------------------> H any_thread_running H + H H + +-------------------------> H H + | #===========================================# + | schedule_entry | ^ + | schedule_entry_preempt | sched_need_resched | schedule_entry + | | schedule_entry_preempt + | v | + | +----------------------+ | + | +--- | | | + | sched_need_resched | | rescheduling | -+ + | +--> | | + | +----------------------+ + | | irq_entry + | v + | +----------------------+ + | | | ---+ + | ---> | | | sched_need_resched + | | preempt_irq | | irq_entry + | | | <--+ + | | | <--+ + | +----------------------+ | + | | schedule_entry | sched_need_resched + | | schedule_entry_preempt | + | v | + | +-----------------------+ | + +-------------------------- | nested_preempt | --+ + +-----------------------+ + ^ irq_entry | + +-------------------+ + +Due to how the ``need_resched`` flag on the preemption count works on arm64, +this monitor is unstable on that architecture, as it often records preemption +when the flag is not set, even in presence of the workaround above. +For the time being, the monitor is disabled by default on arm64. + +Monitor sssw +------------ + +The set state sleep and wakeup (sssw) monitor ensures ``set_state`` to +sleepable leads to sleeping and sleeping tasks require wakeup. It includes the +following types of switch: + +* ``switch_suspend``: + a task puts itself to sleep, this can happen only after explicitly setting + the task to ``sleepable``. After a task is suspended, it needs to be woken up + (``waking`` state) before being switched in again. + Setting the task's state to ``sleepable`` can be reverted before switching if it + is woken up or set to ``runnable``. +* ``switch_blocking``: + a special case of a ``switch_suspend`` where the task is waiting on a + sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set + state events racing with each other and this leads the model to perceive this + type of switch when the task is not set to sleepable. This is a limitation of + the model in SMP system and workarounds may slow down the system. +* ``switch_preempt``: + a task switch as a result of kernel preemption (``schedule_entry_preempt`` in + the nrp model). +* ``switch_yield``: + a task explicitly calls the scheduler or is preempted while returning to + userspace. It can happen after a ``yield`` system call, from the idle task or + if the ``need_resched`` flag is set. By definition, a task cannot yield while + ``sleepable`` as that would be a suspension. A special case of a yield occurs + when a task in ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is + pending. The task doesn't go through the usual blocking/waking and is set + back to runnable, the resulting switch (if there) looks like a yield to the + ``signal_wakeup`` state and is followed by the signal delivery. From this + state, the monitor expects a signal even if it sees a wakeup event, although + not necessary, to rule out false negatives. + +This monitor doesn't include a running state, ``sleepable`` and ``runnable`` +are only referring to the task's desired state, which could be scheduled out +(e.g. due to preemption). However, it does include the event +``sched_switch_in`` to represent when a task is allowed to become running. This +can be triggered also by preemption, but cannot occur after the task got to +``sleeping`` before a ``wakeup`` occurs:: + + +--------------------------------------------------------------------------+ + | | + | | + | switch_suspend | | + | switch_blocking | | + v v | + +----------+ #==========================# set_state_runnable | + | | H H wakeup | + | | H H switch_in | + | | H H switch_yield | + | sleeping | H H switch_preempt | + | | H H signal_deliver | + | | switch_ H H ------+ | + | | _blocking H runnable H | | + | | <----------- H H <-----+ | + +----------+ H H | + | wakeup H H | + +---------------------> H H | + H H | + +---------> H H | + | #==========================# | + | | ^ | + | | | set_state_runnable | + | | | wakeup | + | set_state_sleepable | +------------------------+ + | v | | + | +--------------------------+ set_state_sleepable + | | | switch_in + | | | switch_preempt + signal_deliver | sleepable | signal_deliver + | | | ------+ + | | | | + | | | <-----+ + | +--------------------------+ + | | ^ + | switch_yield | set_state_sleepable + | v | + | +---------------+ | + +---------- | signal_wakeup | -+ + +---------------+ + ^ | switch_in + | | switch_preempt + | | switch_yield + +-----------+ wakeup + References ---------- |