diff options
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/ABI/testing/sysfs-devices-system-cpu | 54 | ||||
| -rw-r--r-- | Documentation/admin-guide/pm/intel_pstate.rst | 104 |
2 files changed, 156 insertions, 2 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 6a1acabb29d8..03f43fb667a3 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -268,6 +268,60 @@ Description: Discover CPUs in the same CPU frequency coordination domain This file is only present if the acpi-cpufreq or the cppc-cpufreq drivers are in use. +What: /sys/devices/system/cpu/cpuX/cpufreq/auto_select +Date: May 2025 +Contact: linux-pm@vger.kernel.org +Description: Autonomous selection enable + + Read/write interface to control autonomous selection enable + Read returns autonomous selection status: + 0: autonomous selection is disabled + 1: autonomous selection is enabled + + Write 'y' or '1' or 'on' to enable autonomous selection. + Write 'n' or '0' or 'off' to disable autonomous selection. + + This file is only present if the cppc-cpufreq driver is in use. + +What: /sys/devices/system/cpu/cpuX/cpufreq/auto_act_window +Date: May 2025 +Contact: linux-pm@vger.kernel.org +Description: Autonomous activity window + + This file indicates a moving utilization sensitivity window to + the platform's autonomous selection policy. + + Read/write an integer represents autonomous activity window (in + microseconds) from/to this file. The max value to write is + 1270000000 but the max significand is 127. This means that if 128 + is written to this file, 127 will be stored. If the value is + greater than 130, only the first two digits will be saved as + significand. + + Writing a zero value to this file enable the platform to + determine an appropriate Activity Window depending on the workload. + + Writing to this file only has meaning when Autonomous Selection is + enabled. + + This file is only present if the cppc-cpufreq driver is in use. + +What: /sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference_val +Date: May 2025 +Contact: linux-pm@vger.kernel.org +Description: Energy performance preference + + Read/write an 8-bit integer from/to this file. This file + represents a range of values from 0 (performance preference) to + 0xFF (energy efficiency preference) that influences the rate of + performance increase/decrease and the result of the hardware's + energy efficiency and performance optimization policies. + + Writing to this file only has meaning when Autonomous Selection is + enabled. + + This file is only present if the cppc-cpufreq driver is in use. + What: /sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1} Date: August 2008 diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 78fc83ed2a7e..26e702c7016e 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the HWP feature, which is why ``intel_pstate`` works with all of them.] +Support for Hybrid Processors +============================= + +Some processors supported by ``intel_pstate`` contain two or more types of CPU +cores differing by the maximum turbo P-state, performance vs power characteristics, +cache sizes, and possibly other properties. They are commonly referred to as +hybrid processors. To support them, ``intel_pstate`` requires HWP to be enabled +and it assumes the HWP performance units to be the same for all CPUs in the +system, so a given HWP performance level always represents approximately the +same physical performance regardless of the core (CPU) type. + +Hybrid Processors with SMT +-------------------------- + +On systems where SMT (Simultaneous Multithreading), also referred to as +HyperThreading (HT) in the context of Intel processors, is enabled on at least +one core, ``intel_pstate`` assigns performance-based priorities to CPUs. Namely, +the priority of a given CPU reflects its highest HWP performance level which +causes the CPU scheduler to generally prefer more performant CPUs, so the less +performant CPUs are used when the other ones are fully loaded. However, SMT +siblings (that is, logical CPUs sharing one physical core) are treated in a +special way such that if one of them is in use, the effective priority of the +other ones is lowered below the priorities of the CPUs located in the other +physical cores. + +This approach maximizes performance in the majority of cases, but unfortunately +it also leads to excessive energy usage in some important scenarios, like video +playback, which is not generally desirable. While there is no other viable +choice with SMT enabled because the effective capacity and utilization of SMT +siblings are hard to determine, hybrid processors without SMT can be handled in +more energy-efficient ways. + +.. _CAS: + +Capacity-Aware Scheduling Support +--------------------------------- + +The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by +``intel_pstate`` by default on hybrid processors without SMT. CAS generally +causes the scheduler to put tasks on a CPU so long as there is a sufficient +amount of spare capacity on it, and if the utilization of a given task is too +high for it, the task will need to go somewhere else. + +Since CAS takes CPU capacities into account, it does not require CPU +prioritization and it allows tasks to be distributed more symmetrically among +the more performant and less performant CPUs. Once placed on a CPU with enough +capacity to accommodate it, a task may just continue to run there regardless of +whether or not the other CPUs are fully loaded, so on average CAS reduces the +utilization of the more performant CPUs which causes the energy usage to be more +balanced because the more performant CPUs are generally less energy-efficient +than the less performant ones. + +In order to use CAS, the scheduler needs to know the capacity of each CPU in +the system and it needs to be able to compute scale-invariant utilization of +CPUs, so ``intel_pstate`` provides it with the requisite information. + +First of all, the capacity of each CPU is represented by the ratio of its highest +HWP performance level, multiplied by 1024, to the highest HWP performance level +of the most performant CPU in the system, which works because the HWP performance +units are the same for all CPUs. Second, the frequency-invariance computations, +carried out by the scheduler to always express CPU utilization in the same units +regardless of the frequency it is currently running at, are adjusted to take the +CPU capacity into account. All of this happens when ``intel_pstate`` has +registered itself with the ``CPUFreq`` core and it has figured out that it is +running on a hybrid processor without SMT. + +Energy-Aware Scheduling Support +------------------------------- + +If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and +``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling +`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the +Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if +``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate`` +to operate in the `passive mode <Passive Mode_>`_. + +The Energy Model registered by ``intel_pstate`` is artificial (that is, it is +based on abstract cost values and it does not include any real power numbers) +and it is relatively simple to avoid unnecessary computations in the scheduler. +There is a performance domain in it for every CPU in the system and the cost +values for these performance domains have been chosen so that running a task on +a less performant (small) CPU appears to be always cheaper than running that +task on a more performant (big) CPU. However, for two CPUs of the same type, +the cost difference depends on their current utilization, and the CPU whose +current utilization is higher generally appears to be a more expensive +destination for a given task. This helps to balance the load among CPUs of the +same type. + +Since EAS works on top of CAS, high-utilization tasks are always migrated to +CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization +tasks tend to be placed on the CPUs that look less expensive to the scheduler. +Effectively, this causes the less performant and less loaded CPUs to be +preferred as long as they have enough spare capacity to run the given task +which generally leads to reduced energy usage. + +The Energy Model created by ``intel_pstate`` can be inspected by looking at +the ``energy_model`` directory in ``debugfs`` (typlically mounted on +``/sys/kernel/debug/``). + + User Space Interface in ``sysfs`` ================================= @@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix. Limits`_ for details). ``no_cas`` - Do not enable capacity-aware scheduling (CAS) which is enabled by - default on hybrid systems. + Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by + default on hybrid systems without SMT. Diagnostics and Tuning ====================== |