summaryrefslogtreecommitdiff
path: root/Documentation/arch
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arch')
-rw-r--r--Documentation/arch/arm64/booting.rst11
-rw-r--r--Documentation/arch/arm64/elf_hwcaps.rst4
-rw-r--r--Documentation/arch/arm64/silicon-errata.rst2
-rw-r--r--Documentation/arch/arm64/sme.rst14
-rw-r--r--Documentation/arch/powerpc/index.rst1
-rw-r--r--Documentation/arch/powerpc/vpa-dtl.rst156
-rw-r--r--Documentation/arch/riscv/hwprobe.rst9
-rw-r--r--Documentation/arch/x86/topology.rst191
8 files changed, 376 insertions, 12 deletions
diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index 2f666a7c303c..e4f953839f71 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -466,6 +466,17 @@ Before jumping into the kernel, the following conditions must be met:
- HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
- HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
+ For CPUs with SPE data source filtering (FEAT_SPE_FDS):
+
+ - If EL3 is present:
+
+ - MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+ - HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
+
For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
- If the kernel is entered at EL1 and EL2 is present:
diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst
index f58ada4d6cb2..a15df4956849 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -441,6 +441,10 @@ HWCAP3_MTE_FAR
HWCAP3_MTE_STORE_ONLY
Functionality implied by ID_AA64PFR2_EL1.MTESTOREONLY == 0b0001.
+HWCAP3_LSFE
+ Functionality implied by ID_AA64ISAR3_EL1.LSFE == 0b0001
+
+
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index b18ef4064bc0..a7ec57060f64 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -200,6 +200,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-V3 | #3312417 | ARM64_ERRATUM_3194386 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-V3AE | #3312417 | ARM64_ERRATUM_3194386 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | ARM_SMMU_MMU_500_CPRE_ERRATA|
| | | #562869,1047329 | |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/Documentation/arch/arm64/sme.rst b/Documentation/arch/arm64/sme.rst
index 4cb38330e704..583f2ee9cb97 100644
--- a/Documentation/arch/arm64/sme.rst
+++ b/Documentation/arch/arm64/sme.rst
@@ -81,17 +81,7 @@ The ZA matrix is square with each side having as many bytes as a streaming
mode SVE vector.
-3. Sharing of streaming and non-streaming mode SVE state
----------------------------------------------------------
-
-It is implementation defined which if any parts of the SVE state are shared
-between streaming and non-streaming modes. When switching between modes
-via software interfaces such as ptrace if no register content is provided as
-part of switching no state will be assumed to be shared and everything will
-be zeroed.
-
-
-4. System call behaviour
+3. System call behaviour
-------------------------
* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
@@ -112,7 +102,7 @@ be zeroed.
exceptions for execve() described in section 6.
-5. Signal handling
+4. Signal handling
-------------------
* Signal handlers are invoked with PSTATE.SM=0, PSTATE.ZA=0, and TPIDR2_EL0=0.
diff --git a/Documentation/arch/powerpc/index.rst b/Documentation/arch/powerpc/index.rst
index 53fc9f89f3e4..1be2ee3f0361 100644
--- a/Documentation/arch/powerpc/index.rst
+++ b/Documentation/arch/powerpc/index.rst
@@ -37,6 +37,7 @@ powerpc
vas-api
vcpudispatch_stats
vmemmap_dedup
+ vpa-dtl
features
diff --git a/Documentation/arch/powerpc/vpa-dtl.rst b/Documentation/arch/powerpc/vpa-dtl.rst
new file mode 100644
index 000000000000..58d0022f993a
--- /dev/null
+++ b/Documentation/arch/powerpc/vpa-dtl.rst
@@ -0,0 +1,156 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _vpa-dtl:
+
+===================================
+DTL (Dispatch Trace Log)
+===================================
+
+Athira Rajeev, 19 April 2025
+
+.. contents::
+ :depth: 3
+
+
+Basic overview
+==============
+
+The pseries Shared Processor Logical Partition(SPLPAR) machines can
+retrieve a log of dispatch and preempt events from the hypervisor
+using data from Disptach Trace Log(DTL) buffer. With this information,
+user can retrieve when and why each dispatch & preempt has occurred.
+The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
+via perf.
+
+Infrastructure used
+===================
+
+The VPA DTL PMU counters do not interrupt on overflow or generate any
+PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
+nterval can be provided by user via sample_period field in nano seconds.
+vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
+Trace Log) contains information about dispatch/preempt, enqueue time etc.
+We directly copy the DTL buffer data as part of auxiliary buffer and it
+will be processed later. This will avoid time taken to create samples
+in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
+entries makes use of AUX support in perf infrastructure. On the tools side,
+this data is made available as PERF_RECORD_AUXTRACE records.
+
+To correlate each DTL entry with other events across CPU's, an auxtrace_queue
+is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
+All auxtrace queues is maintained in auxtrace heap. The queues are sorted
+based on timestamp. When the different PERF_RECORD_XX records are processed,
+compare the timestamp of perf record with timestamp of top element in the
+auxtrace heap so that DTL events can be co-related with other events
+Process the auxtrace queue if the timestamp of element from heap is
+lower than timestamp from entry in perf record. Sometimes it could happen that
+one buffer is only partially processed. if the timestamp of occurrence of
+another event is more than currently processed element in the queue, it will
+move on to next perf record. So keep track of position of buffer to continue
+processing next time. Update the timestamp of the auxtrace heap with the timestamp
+of last processed entry from the auxtrace buffer.
+
+This infrastructure ensures dispatch trace log entries can be correlated
+and presented along with other events like sched.
+
+vpa-dtl PMU example usage
+=========================
+
+.. code-block:: sh
+
+ # ls /sys/devices/vpa_dtl/
+ events format perf_event_mux_interval_ms power subsystem type uevent
+
+
+To capture the DTL data using perf record:
+.. code-block:: sh
+
+ # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+
+The result can be interpreted using perf record. Snippet of perf report -D
+
+.. code-block:: sh
+
+ # ./perf report -D
+
+There are different PERF_RECORD_XX records. In that records corresponding to
+auxtrace buffers includes:
+
+1. PERF_RECORD_AUX
+ Conveys that new data is available in AUX area
+
+2. PERF_RECORD_AUXTRACE_INFO
+ Describes offset and size of auxtrace data in the buffers
+
+3. PERF_RECORD_AUXTRACE
+ This is the record that defines the auxtrace data which here in case of
+ vpa-dtl pmu is dispatch trace log data.
+
+Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
+
+.. code-block:: sh
+
+0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
+.
+. ... VPA DTL PMU data: size 1680 bytes, entries is 35
+. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
+. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
+. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
+. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
+. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
+. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
+. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
+. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
+. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
+
+Above is representation of dtl entry of below format:
+
+struct dtl_entry {
+ u8 dispatch_reason;
+ u8 preempt_reason;
+ u16 processor_id;
+ u32 enqueue_to_dispatch_time;
+ u32 ready_to_enqueue_time;
+ u32 waiting_to_ready_time;
+ u64 timebase;
+ u64 fault_addr;
+ u64 srr0;
+ u64 srr1;
+
+};
+
+First two fields represent the dispatch reason and preempt reason. The post
+processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
+for user to consume.
+
+Visualize the dispatch trace log entries with perf report
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 0.300 MB perf.data ]
+
+ # ./perf report
+ # Samples: 321 of event 'vpa-dtl'
+ # Event count (approx.): 321
+ #
+ # Children Self Command Shared Object Symbol
+ # ........ ........ ....... ................. ..............................
+ #
+ 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
+
+Visualize the dispatch trace log entries with perf script
+=========================================================
+
+.. code-block:: sh
+
+ # ./perf script
+ migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
+ migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
+ migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
+ migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
+ :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
+ perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
+ perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010
diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
index 2aa9be272d5d..2f449c9b15bd 100644
--- a/Documentation/arch/riscv/hwprobe.rst
+++ b/Documentation/arch/riscv/hwprobe.rst
@@ -327,6 +327,15 @@ The following keys are defined:
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are
not supported at all and will generate a misaligned address fault.
+* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_MIPS_0`: A bitmask containing the
+ mips vendor extensions that are compatible with the
+ :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior.
+
+ * MIPS
+
+ * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XMIPSEXECTL`: The xmipsexectl vendor
+ extension is supported in the MIPS ISA extensions spec.
+
* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the
thead vendor extensions that are compatible with the
:c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior.
diff --git a/Documentation/arch/x86/topology.rst b/Documentation/arch/x86/topology.rst
index c12837e61bda..86bec8ac2c4d 100644
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -141,6 +141,197 @@ Thread-related topology information in the kernel:
+System topology enumeration
+===========================
+
+The topology on x86 systems can be discovered using a combination of vendor
+specific CPUID leaves which enumerate the processor topology and the cache
+hierarchy.
+
+The CPUID leaves in their preferred order of parsing for each x86 vendor is as
+follows:
+
+1) AMD
+
+ 1) CPUID leaf 0x80000026 [Extended CPU Topology] (Core::X86::Cpuid::ExCpuTopology)
+
+ The extended CPUID leaf 0x80000026 is the extension of the CPUID leaf 0xB
+ and provides the topology information of Core, Complex, CCD (Die), and
+ Socket in each level.
+
+ Support for the leaf is discovered by checking if the maximum extended
+ CPUID level is >= 0x80000026 and then checking if `LogProcAtThisLevel`
+ in `EBX[15:0]` at a particular level (starting from 0) is non-zero.
+
+ The `LevelType` in `ECX[15:8]` at the level provides the topology domain
+ the level describes - Core, Complex, CCD(Die), or the Socket.
+
+ The kernel uses the `CoreMaskWidth` from `EAX[4:0]` to discover the
+ number of bits that need to be right-shifted from `ExtendedLocalApicId`
+ in `EDX[31:0]` in order to get a unique Topology ID for the topology
+ level. CPUs with the same Topology ID share the resources at that level.
+
+ CPUID leaf 0x80000026 also provides more information regarding the power
+ and efficiency rankings, and about the core type on AMD processors with
+ heterogeneous characteristics.
+
+ If CPUID leaf 0x80000026 is supported, further parsing is not required.
+
+ 2) CPUID leaf 0x0000000B [Extended Topology Enumeration] (Core::X86::Cpuid::ExtTopEnum)
+
+ The extended CPUID leaf 0x0000000B is the predecessor on the extended
+ CPUID leaf 0x80000026 and only describes the core, and the socket domains
+ of the processor topology.
+
+ The support for the leaf is discovered by checking if the maximum supported
+ CPUID level is >= 0xB and then if `EBX[31:0]` at a particular level
+ (starting from 0) is non-zero.
+
+ The `LevelType` in `ECX[15:8]` at the level provides the topology domain
+ that the level describes - Thread, or Processor (Socket).
+
+ The kernel uses the `CoreMaskWidth` from `EAX[4:0]` to discover the
+ number of bits that need to be right-shifted from the `ExtendedLocalApicId`
+ in `EDX[31:0]` to get a unique Topology ID for that topology level. CPUs
+ sharing the Topology ID share the resources at that level.
+
+ If CPUID leaf 0xB is supported, further parsing is not required.
+
+
+ 3) CPUID leaf 0x80000008 ECX [Size Identifiers] (Core::X86::Cpuid::SizeId)
+
+ If neither the CPUID leaf 0x80000026 nor 0xB is supported, the number of
+ CPUs on the package is detected using the Size Identifier leaf
+ 0x80000008 ECX.
+
+ The support for the leaf is discovered by checking if the supported
+ extended CPUID level is >= 0x80000008.
+
+ The shifts from the APIC ID for the Socket ID is calculated from the
+ `ApicIdSize` field in `ECX[15:12]` if it is non-zero.
+
+ If `ApicIdSize` is reported to be zero, the shift is calculated as the
+ order of the `number of threads` calculated from `NC` field in
+ `ECX[7:0]` which describes the `number of threads - 1` on the package.
+
+ Unless Extended APIC ID is supported, the APIC ID used to find the
+ Socket ID is from the `LocalApicId` field of CPUID leaf 0x00000001
+ `EBX[31:24]`.
+
+ The topology parsing continues to detect if Extended APIC ID is
+ supported or not.
+
+
+ 4) CPUID leaf 0x8000001E [Extended APIC ID, Core Identifiers, Node Identifiers]
+ (Core::X86::Cpuid::{ExtApicId,CoreId,NodeId})
+
+ The support for Extended APIC ID can be detected by checking for the
+ presence of `TopologyExtensions` in `ECX[22]` of CPUID leaf 0x80000001
+ [Feature Identifiers] (Core::X86::Cpuid::FeatureExtIdEcx).
+
+ If Topology Extensions is supported, the APIC ID from `ExtendedApicId`
+ from CPUID leaf 0x8000001E `EAX[31:0]` should be preferred over that from
+ `LocalApicId` field of CPUID leaf 0x00000001 `EBX[31:24]` for topology
+ enumeration.
+
+ On processors of Family 0x17 and above that do not support CPUID leaf
+ 0x80000026 or CPUID leaf 0xB, the shifts from the APIC ID for the Core
+ ID is calculated using the order of `number of threads per core`
+ calculated using the `ThreadsPerCore` field in `EBX[15:8]` which
+ describes `number of threads per core - 1`.
+
+ On Processors of Family 0x15, the Core ID from `EBX[7:0]` is used as the
+ `cu_id` (Compute Unit ID) to detect CPUs that share the compute units.
+
+
+ All AMD processors that support the `TopologyExtensions` feature store the
+ `NodeId` from the `ECX[7:0]` of CPUID leaf 0x8000001E
+ (Core::X86::Cpuid::NodeId) as the per-CPU `node_id`. On older processors,
+ the `node_id` was discovered using MSR_FAM10H_NODE_ID MSR (MSR
+ 0x0xc001_100c). The presence of the NODE_ID MSR was detected by checking
+ `ECX[19]` of CPUID leaf 0x80000001 [Feature Identifiers]
+ (Core::X86::Cpuid::FeatureExtIdEcx).
+
+
+2) Intel
+
+ On Intel platforms, the CPUID leaves that enumerate the processor
+ topology are as follows:
+
+ 1) CPUID leaf 0x1F (V2 Extended Topology Enumeration Leaf)
+
+ The CPUID leaf 0x1F is the extension of the CPUID leaf 0xB and provides
+ the topology information of Core, Module, Tile, Die, DieGrp, and Socket
+ in each level.
+
+ The support for the leaf is discovered by checking if the supported
+ CPUID level is >= 0x1F and then `EBX[31:0]` at a particular level
+ (starting from 0) is non-zero.
+
+ The `Domain Type` in `ECX[15:8]` of the sub-leaf provides the topology
+ domain that the level describes - Core, Module, Tile, Die, DieGrp, and
+ Socket.
+
+ The kernel uses the value from `EAX[4:0]` to discover the number of
+ bits that need to be right shifted from the `x2APIC ID` in `EDX[31:0]`
+ to get a unique Topology ID for the topology level. CPUs with the same
+ Topology ID share the resources at that level.
+
+ If CPUID leaf 0x1F is supported, further parsing is not required.
+
+
+ 2) CPUID leaf 0x0000000B (Extended Topology Enumeration Leaf)
+
+ The extended CPUID leaf 0x0000000B is the predecessor of the V2 Extended
+ Topology Enumeration Leaf 0x1F and only describes the core, and the
+ socket domains of the processor topology.
+
+ The support for the leaf is iscovered by checking if the supported CPUID
+ level is >= 0xB and then checking if `EBX[31:0]` at a particular level
+ (starting from 0) is non-zero.
+
+ CPUID leaf 0x0000000B shares the same layout as CPUID leaf 0x1F and
+ should be enumerated in a similar manner.
+
+ If CPUID leaf 0xB is supported, further parsing is not required.
+
+
+ 3) CPUID leaf 0x00000004 (Deterministic Cache Parameters Leaf)
+
+ On Intel processors that support neither CPUID leaf 0x1F, nor CPUID leaf
+ 0xB, the shifts for the SMT domains is calculated using the number of
+ CPUs sharing the L1 cache.
+
+ Processors that feature Hyper-Threading is detected using `EDX[28]` of
+ CPUID leaf 0x1 (Basic CPUID Information).
+
+ The order of `Maximum number of addressable IDs for logical processors
+ sharing this cache` from `EAX[25:14]` of level-0 of CPUID 0x4 provides
+ the shifts from the APIC ID required to compute the Core ID.
+
+ The APIC ID and Package information is computed using the data from
+ CPUID leaf 0x1.
+
+
+ 4) CPUID leaf 0x00000001 (Basic CPUID Information)
+
+ The mask and shifts to derive the Physical Package (socket) ID is
+ computed using the `Maximum number of addressable IDs for logical
+ processors in this physical package` from `EBX[23:16]` of CPUID leaf
+ 0x1.
+
+ The APIC ID on the legacy platforms is derived from the `Initial APIC
+ ID` field from `EBX[31:24]` of CPUID leaf 0x1.
+
+
+3) Centaur and Zhaoxin
+
+ Similar to Intel, Centaur and Zhaoxin use a combination of CPUID leaf
+ 0x00000004 (Deterministic Cache Parameters Leaf) and CPUID leaf 0x00000001
+ (Basic CPUID Information) to derive the topology information.
+
+
+
System topology examples
========================