summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/ima_policy3
-rw-r--r--Documentation/ABI/testing/sysfs-block-bcache7
-rw-r--r--Documentation/ABI/testing/sysfs-module2
-rw-r--r--Documentation/ABI/testing/sysfs-power16
-rw-r--r--Documentation/Kconfig2
-rw-r--r--Documentation/Makefile160
-rw-r--r--Documentation/RCU/Design/Requirements/Requirements.rst33
-rw-r--r--Documentation/RCU/checklist.rst12
-rw-r--r--Documentation/RCU/whatisRCU.rst3
-rw-r--r--Documentation/accounting/taskstats.rst54
-rw-r--r--Documentation/admin-guide/LSM/Smack.rst16
-rw-r--r--Documentation/admin-guide/LSM/ipe.rst17
-rw-r--r--Documentation/admin-guide/RAS/main.rst142
-rw-r--r--Documentation/admin-guide/bcache.rst13
-rw-r--r--Documentation/admin-guide/blockdev/zoned_loop.rst61
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst31
-rw-r--r--Documentation/admin-guide/efi-stub.rst3
-rw-r--r--Documentation/admin-guide/hw-vuln/l1d_flush.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/spectre.rst2
-rw-r--r--Documentation/admin-guide/kernel-parameters.rst97
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt118
-rw-r--r--Documentation/admin-guide/md.rst10
-rw-r--r--Documentation/admin-guide/pm/cpuidle.rst9
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst133
-rw-r--r--Documentation/admin-guide/sysctl/net.rst29
-rw-r--r--Documentation/admin-guide/tainted-kernels.rst2
-rw-r--r--Documentation/admin-guide/thermal/index.rst1
-rw-r--r--Documentation/admin-guide/thermal/intel_thermal_throttle.rst91
-rw-r--r--Documentation/admin-guide/workload-tracing.rst10
-rw-r--r--Documentation/arch/arm64/booting.rst8
-rw-r--r--Documentation/arch/arm64/sve.rst5
-rw-r--r--Documentation/arch/s390/s390dbf.rst5
-rw-r--r--Documentation/arch/x86/boot.rst40
-rw-r--r--Documentation/bpf/libbpf/program_types.rst18
-rw-r--r--Documentation/bpf/map_array.rst5
-rw-r--r--Documentation/conf.py15
-rw-r--r--Documentation/core-api/assoc_array.rst196
-rw-r--r--Documentation/core-api/printk-formats.rst11
-rw-r--r--Documentation/crypto/index.rst1
-rw-r--r--Documentation/crypto/sha3.rst130
-rw-r--r--Documentation/crypto/userspace-if.rst7
-rw-r--r--Documentation/dev-tools/checkpatch.rst13
-rw-r--r--Documentation/dev-tools/kunit/run_manual.rst6
-rw-r--r--Documentation/devicetree/bindings/crypto/amd,ccp-seattle-v1a.yaml3
-rw-r--r--Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml1
-rw-r--r--Documentation/devicetree/bindings/crypto/qcom,prng.yaml1
-rw-r--r--Documentation/devicetree/bindings/crypto/qcom-qce.yaml1
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml3
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml13
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml4
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml17
-rw-r--r--Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml4
-rw-r--r--Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml35
-rw-r--r--Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/amd,xgbe-seattle-v1a.yaml147
-rw-r--r--Documentation/devicetree/bindings/net/amd-xgbe.txt76
-rw-r--r--Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml7
-rw-r--r--Documentation/devicetree/bindings/net/bluetooth/marvell,sd8897-bt.yaml79
-rw-r--r--Documentation/devicetree/bindings/net/btusb.txt2
-rw-r--r--Documentation/devicetree/bindings/net/can/bosch,m_can.yaml25
-rw-r--r--Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml5
-rw-r--r--Documentation/devicetree/bindings/net/can/microchip,mpfs-can.yaml5
-rw-r--r--Documentation/devicetree/bindings/net/cdns,macb.yaml23
-rw-r--r--Documentation/devicetree/bindings/net/dsa/lantiq,gswip.yaml164
-rw-r--r--Documentation/devicetree/bindings/net/dsa/motorcomm,yt921x.yaml167
-rw-r--r--Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml129
-rw-r--r--Documentation/devicetree/bindings/net/ethernet-phy.yaml10
-rw-r--r--Documentation/devicetree/bindings/net/fsl,enetc.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt83
-rw-r--r--Documentation/devicetree/bindings/net/mediatek,net.yaml26
-rw-r--r--Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt73
-rw-r--r--Documentation/devicetree/bindings/net/mscc-phy-vsc8531.yaml131
-rw-r--r--Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/qcom,ethqos.yaml8
-rw-r--r--Documentation/devicetree/bindings/net/rockchip-dwmac.yaml3
-rw-r--r--Documentation/devicetree/bindings/net/snps,dwmac.yaml6
-rw-r--r--Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml19
-rw-r--r--Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml66
-rw-r--r--Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml29
-rw-r--r--Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml26
-rw-r--r--Documentation/devicetree/bindings/pinctrl/xlnx,versal-pinctrl.yaml1
-rw-r--r--Documentation/devicetree/bindings/rng/microchip,pic32-rng.txt17
-rw-r--r--Documentation/devicetree/bindings/rng/microchip,pic32-rng.yaml40
-rw-r--r--Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml87
-rw-r--r--Documentation/devicetree/bindings/thermal/qcom-tsens.yaml9
-rw-r--r--Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml6
-rw-r--r--Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml47
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.yaml4
-rw-r--r--Documentation/doc-guide/checktransupdate.rst6
-rw-r--r--Documentation/doc-guide/contributing.rst2
-rw-r--r--Documentation/doc-guide/kernel-doc.rst33
-rw-r--r--Documentation/doc-guide/parse-headers.rst189
-rw-r--r--Documentation/doc-guide/sphinx.rst6
-rw-r--r--Documentation/driver-api/dpll.rst36
-rw-r--r--Documentation/driver-api/parport-lowlevel.rst5
-rw-r--r--Documentation/driver-api/pldmfw/index.rst1
-rw-r--r--Documentation/driver-api/thermal/intel_dptf.rst23
-rw-r--r--Documentation/driver-api/usb/writing_musb_glue_layer.rst2
-rwxr-xr-xDocumentation/features/list-arch.sh11
-rwxr-xr-xDocumentation/features/scripts/features-refresh.sh98
-rw-r--r--Documentation/filesystems/ext4/inodes.rst2
-rw-r--r--Documentation/filesystems/ext4/super.rst4
-rw-r--r--Documentation/filesystems/fscrypt.rst2
-rw-r--r--Documentation/filesystems/gfs2/glocks.rst (renamed from Documentation/filesystems/gfs2-glocks.rst)0
-rw-r--r--Documentation/filesystems/gfs2/index.rst (renamed from Documentation/filesystems/gfs2.rst)12
-rw-r--r--Documentation/filesystems/gfs2/uevents.rst (renamed from Documentation/filesystems/gfs2-uevents.rst)0
-rw-r--r--Documentation/filesystems/index.rst4
-rw-r--r--Documentation/filesystems/iomap/operations.rst50
-rw-r--r--Documentation/filesystems/porting.rst15
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.rst12
-rw-r--r--Documentation/filesystems/resctrl.rst134
-rw-r--r--Documentation/filesystems/xfs/xfs-online-fsck-design.rst238
-rw-r--r--Documentation/input/event-codes.rst25
-rw-r--r--Documentation/kbuild/kbuild.rst10
-rw-r--r--Documentation/locking/seqlock.rst9
-rw-r--r--Documentation/memory-barriers.txt6
-rw-r--r--Documentation/misc-devices/amd-sbi.rst6
-rw-r--r--Documentation/misc-devices/mrvl_cn10k_dpi.rst4
-rw-r--r--Documentation/misc-devices/tps6594-pfsm.rst12
-rw-r--r--Documentation/misc-devices/uacce.rst7
-rw-r--r--Documentation/mm/active_mm.rst2
-rw-r--r--Documentation/netlink/genetlink-c.yaml2
-rw-r--r--Documentation/netlink/genetlink.yaml2
-rw-r--r--Documentation/netlink/netlink-raw.yaml2
-rw-r--r--Documentation/netlink/specs/conntrack.yaml2
-rw-r--r--Documentation/netlink/specs/devlink.yaml11
-rw-r--r--Documentation/netlink/specs/dpll.yaml7
-rw-r--r--Documentation/netlink/specs/em.yaml113
-rw-r--r--Documentation/netlink/specs/ethtool.yaml88
-rw-r--r--Documentation/netlink/specs/netdev.yaml28
-rw-r--r--Documentation/netlink/specs/nftables.yaml2
-rw-r--r--Documentation/netlink/specs/psp.yaml95
-rw-r--r--Documentation/netlink/specs/rt-addr.yaml7
-rw-r--r--Documentation/netlink/specs/rt-link.yaml50
-rw-r--r--Documentation/netlink/specs/rt-neigh.yaml2
-rw-r--r--Documentation/netlink/specs/rt-route.yaml8
-rw-r--r--Documentation/netlink/specs/rt-rule.yaml6
-rw-r--r--Documentation/netlink/specs/wireguard.yaml298
-rw-r--r--Documentation/networking/6pack.rst2
-rw-r--r--Documentation/networking/arcnet-hardware.rst22
-rw-r--r--Documentation/networking/arcnet.rst48
-rw-r--r--Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst10
-rw-r--r--Documentation/networking/device_drivers/ethernet/index.rst1
-rw-r--r--Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst17
-rw-r--r--Documentation/networking/devlink/devlink-eswitch-attr.rst13
-rw-r--r--Documentation/networking/devlink/devlink-params.rst14
-rw-r--r--Documentation/networking/devlink/i40e.rst34
-rw-r--r--Documentation/networking/devlink/index.rst1
-rw-r--r--Documentation/networking/devlink/mlx5.rst14
-rw-r--r--Documentation/networking/devlink/stmmac.rst40
-rw-r--r--Documentation/networking/dsa/dsa.rst17
-rw-r--r--Documentation/networking/ethtool-netlink.rst64
-rw-r--r--Documentation/networking/index.rst5
-rw-r--r--Documentation/networking/ip-sysctl.rst60
-rw-r--r--Documentation/networking/napi.rst50
-rw-r--r--Documentation/networking/net_cachelines/inet_connection_sock.rst2
-rw-r--r--Documentation/networking/net_cachelines/inet_sock.rst79
-rw-r--r--Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst3
-rw-r--r--Documentation/networking/netconsole.rst2
-rw-r--r--Documentation/networking/nfc.rst6
-rw-r--r--Documentation/networking/smc-sysctl.rst40
-rw-r--r--Documentation/networking/statistics.rst4
-rw-r--r--Documentation/networking/tls.rst20
-rw-r--r--Documentation/networking/xfrm/index.rst13
-rw-r--r--Documentation/networking/xfrm/xfrm_device.rst (renamed from Documentation/networking/xfrm_device.rst)20
-rw-r--r--Documentation/networking/xfrm/xfrm_proc.rst (renamed from Documentation/networking/xfrm_proc.rst)0
-rw-r--r--Documentation/networking/xfrm/xfrm_sync.rst (renamed from Documentation/networking/xfrm_sync.rst)97
-rw-r--r--Documentation/networking/xfrm/xfrm_sysctl.rst (renamed from Documentation/networking/xfrm_sysctl.rst)4
-rw-r--r--Documentation/power/index.rst1
-rw-r--r--Documentation/power/pm_qos_interface.rst9
-rw-r--r--Documentation/power/runtime_pm.rst10
-rw-r--r--Documentation/power/shutdown-debugging.rst53
-rw-r--r--Documentation/process/2.Process.rst47
-rw-r--r--Documentation/process/coding-style.rst2
-rw-r--r--Documentation/process/submitting-patches.rst5
-rw-r--r--Documentation/rust/quick-start.rst4
-rw-r--r--Documentation/security/keys/trusted-encrypted.rst88
-rw-r--r--Documentation/sound/codecs/cs35l56.rst9
-rw-r--r--Documentation/sphinx/kernel_abi.py6
-rw-r--r--Documentation/sphinx/kernel_feat.py26
-rwxr-xr-xDocumentation/sphinx/kernel_include.py4
-rw-r--r--Documentation/sphinx/kerneldoc-preamble.sty2
-rw-r--r--Documentation/sphinx/kerneldoc.py6
-rw-r--r--Documentation/sphinx/load_config.py60
-rw-r--r--Documentation/sphinx/parallel-wrapper.sh33
-rw-r--r--Documentation/tools/rtla/common_appendix.txt (renamed from Documentation/tools/rtla/common_appendix.rst)0
-rw-r--r--Documentation/tools/rtla/common_hist_options.txt (renamed from Documentation/tools/rtla/common_hist_options.rst)0
-rw-r--r--Documentation/tools/rtla/common_options.txt (renamed from Documentation/tools/rtla/common_options.rst)16
-rw-r--r--Documentation/tools/rtla/common_osnoise_description.txt (renamed from Documentation/tools/rtla/common_osnoise_description.rst)0
-rw-r--r--Documentation/tools/rtla/common_osnoise_options.txt (renamed from Documentation/tools/rtla/common_osnoise_options.rst)0
-rw-r--r--Documentation/tools/rtla/common_timerlat_aa.txt (renamed from Documentation/tools/rtla/common_timerlat_aa.rst)0
-rw-r--r--Documentation/tools/rtla/common_timerlat_description.txt (renamed from Documentation/tools/rtla/common_timerlat_description.rst)0
-rw-r--r--Documentation/tools/rtla/common_timerlat_options.txt (renamed from Documentation/tools/rtla/common_timerlat_options.rst)4
-rw-r--r--Documentation/tools/rtla/common_top_options.txt (renamed from Documentation/tools/rtla/common_top_options.rst)0
-rw-r--r--Documentation/tools/rtla/rtla-hwnoise.rst8
-rw-r--r--Documentation/tools/rtla/rtla-osnoise-hist.rst10
-rw-r--r--Documentation/tools/rtla/rtla-osnoise-top.rst10
-rw-r--r--Documentation/tools/rtla/rtla-osnoise.rst4
-rw-r--r--Documentation/tools/rtla/rtla-timerlat-hist.rst12
-rw-r--r--Documentation/tools/rtla/rtla-timerlat-top.rst14
-rw-r--r--Documentation/tools/rtla/rtla-timerlat.rst4
-rw-r--r--Documentation/tools/rtla/rtla.rst2
-rw-r--r--Documentation/trace/timerlat-tracer.rst12
-rw-r--r--Documentation/translations/it_IT/doc-guide/parse-headers.rst8
-rw-r--r--Documentation/translations/it_IT/doc-guide/sphinx.rst4
-rw-r--r--Documentation/translations/ja_JP/SubmittingPatches28
-rw-r--r--Documentation/translations/zh_CN/admin-guide/README.rst2
-rw-r--r--Documentation/translations/zh_CN/block/blk-mq.rst130
-rw-r--r--Documentation/translations/zh_CN/block/data-integrity.rst192
-rw-r--r--Documentation/translations/zh_CN/block/index.rst35
-rw-r--r--Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst2
-rw-r--r--Documentation/translations/zh_CN/doc-guide/checktransupdate.rst6
-rw-r--r--Documentation/translations/zh_CN/doc-guide/contributing.rst2
-rw-r--r--Documentation/translations/zh_CN/doc-guide/parse-headers.rst8
-rw-r--r--Documentation/translations/zh_CN/doc-guide/sphinx.rst4
-rw-r--r--Documentation/translations/zh_CN/filesystems/dnotify.rst67
-rw-r--r--Documentation/translations/zh_CN/filesystems/gfs2-glocks.rst211
-rw-r--r--Documentation/translations/zh_CN/filesystems/gfs2-uevents.rst97
-rw-r--r--Documentation/translations/zh_CN/filesystems/gfs2.rst57
-rw-r--r--Documentation/translations/zh_CN/filesystems/index.rst17
-rw-r--r--Documentation/translations/zh_CN/filesystems/inotify.rst80
-rw-r--r--Documentation/translations/zh_CN/filesystems/ubifs-authentication.rst354
-rw-r--r--Documentation/translations/zh_CN/filesystems/ubifs.rst114
-rw-r--r--Documentation/translations/zh_CN/how-to.rst4
-rw-r--r--Documentation/translations/zh_CN/kbuild/kbuild.rst27
-rw-r--r--Documentation/translations/zh_CN/mm/active_mm.rst2
-rw-r--r--Documentation/translations/zh_CN/networking/generic-hdlc.rst176
-rw-r--r--Documentation/translations/zh_CN/networking/index.rst7
-rw-r--r--Documentation/translations/zh_CN/networking/mptcp-sysctl.rst139
-rw-r--r--Documentation/translations/zh_CN/networking/timestamping.rst674
-rw-r--r--Documentation/translations/zh_CN/rust/general-information.rst1
-rw-r--r--Documentation/translations/zh_CN/rust/index.rst33
-rw-r--r--Documentation/translations/zh_CN/rust/testing.rst215
-rw-r--r--Documentation/translations/zh_CN/scsi/index.rst92
-rw-r--r--Documentation/translations/zh_CN/scsi/libsas.rst425
-rw-r--r--Documentation/translations/zh_CN/scsi/link_power_management_policy.rst32
-rw-r--r--Documentation/translations/zh_CN/scsi/scsi-parameters.rst118
-rw-r--r--Documentation/translations/zh_CN/scsi/scsi.rst48
-rw-r--r--Documentation/translations/zh_CN/scsi/scsi_eh.rst482
-rw-r--r--Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst1174
-rw-r--r--Documentation/translations/zh_CN/scsi/sd-parameters.rst38
-rw-r--r--Documentation/translations/zh_CN/scsi/wd719x.rst35
-rw-r--r--Documentation/translations/zh_CN/security/SCTP.rst317
-rw-r--r--Documentation/translations/zh_CN/security/index.rst4
-rw-r--r--Documentation/translations/zh_CN/security/ipe.rst398
-rw-r--r--Documentation/translations/zh_CN/security/lsm-development.rst19
-rw-r--r--Documentation/translations/zh_CN/security/secrets/coco.rst96
-rw-r--r--Documentation/translations/zh_CN/security/secrets/index.rst9
-rw-r--r--Documentation/translations/zh_CN/subsystem-apis.rst3
-rw-r--r--Documentation/translations/zh_TW/admin-guide/README.rst2
-rw-r--r--Documentation/translations/zh_TW/dev-tools/gdb-kernel-debugging.rst2
-rw-r--r--Documentation/userspace-api/netlink/intro-specs.rst4
-rw-r--r--Documentation/userspace-api/spec_ctrl.rst6
-rw-r--r--Documentation/w1/w1-netlink.rst2
-rw-r--r--Documentation/wmi/driver-development-guide.rst1
257 files changed, 10041 insertions, 1962 deletions
diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy
index c2385183826c..d4b3696a9efb 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -20,9 +20,10 @@ Description:
rule format: action [condition ...]
action: measure | dont_measure | appraise | dont_appraise |
- audit | hash | dont_hash
+ audit | dont_audit | hash | dont_hash
condition:= base | lsm [option]
base: [[func=] [mask=] [fsmagic=] [fsuuid=] [fsname=]
+ [fs_subtype=]
[uid=] [euid=] [gid=] [egid=]
[fowner=] [fgroup=]]
lsm: [[subj_user=] [subj_role=] [subj_type=]
diff --git a/Documentation/ABI/testing/sysfs-block-bcache b/Documentation/ABI/testing/sysfs-block-bcache
index 9e4bbc5d51fd..9344a657ca70 100644
--- a/Documentation/ABI/testing/sysfs-block-bcache
+++ b/Documentation/ABI/testing/sysfs-block-bcache
@@ -106,13 +106,6 @@ Description:
will be discarded from the cache. Should not be turned off with
writeback caching enabled.
-What: /sys/block/<disk>/bcache/discard
-Date: November 2010
-Contact: Kent Overstreet <kent.overstreet@gmail.com>
-Description:
- For a cache, a boolean allowing discard/TRIM to be turned off
- or back on if the device supports it.
-
What: /sys/block/<disk>/bcache/bucket_size
Date: November 2010
Contact: Kent Overstreet <kent.overstreet@gmail.com>
diff --git a/Documentation/ABI/testing/sysfs-module b/Documentation/ABI/testing/sysfs-module
index 62addab47d0c..6bc9af6229f0 100644
--- a/Documentation/ABI/testing/sysfs-module
+++ b/Documentation/ABI/testing/sysfs-module
@@ -59,6 +59,8 @@ Description: Module taint flags:
F force-loaded module
C staging driver module
E unsigned module
+ K livepatch module
+ N in-kernel test module
== =====================
What: /sys/module/grant_table/parameters/free_per_iteration
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index 4d8e1ad020f0..d38da077905a 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -454,3 +454,19 @@ Description:
disables it. Reads from the file return the current value.
The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config
flag is unset, or "0" otherwise.
+
+What: /sys/power/hibernate_compression_threads
+Date: October 2025
+Contact: <luoxueqin@kylinos.cn>
+Description:
+ Controls the number of threads used for compression
+ and decompression of hibernation images.
+
+ The value can be adjusted at runtime to balance
+ performance and CPU utilization.
+
+ The change takes effect on the next hibernation or
+ resume operation.
+
+ Minimum value: 1
+ Default value: 3
diff --git a/Documentation/Kconfig b/Documentation/Kconfig
index 3a0e7ac0c4e3..8b6c4b84b218 100644
--- a/Documentation/Kconfig
+++ b/Documentation/Kconfig
@@ -19,7 +19,7 @@ config WARN_ABI_ERRORS
described at Documentation/ABI/README. Yet, as they're manually
written, it would be possible that some of those files would
have errors that would break them for being parsed by
- scripts/get_abi.pl. Add a check to verify them.
+ tools/docs/get_abi.py. Add a check to verify them.
If unsure, select 'N'.
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 3609cb86137b..e96ac6dcac4f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -8,12 +8,12 @@ subdir- := devicetree/bindings
ifneq ($(MAKECMDGOALS),cleandocs)
# Check for broken documentation file references
ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
-$(shell $(srctree)/scripts/documentation-file-ref-check --warn)
+$(shell $(srctree)/tools/docs/documentation-file-ref-check --warn)
endif
# Check for broken ABI files
ifeq ($(CONFIG_WARN_ABI_ERRORS),y)
-$(shell $(srctree)/scripts/get_abi.py --dir $(srctree)/Documentation/ABI validate)
+$(shell $(srctree)/tools/docs/get_abi.py --dir $(srctree)/Documentation/ABI validate)
endif
endif
@@ -23,21 +23,22 @@ SPHINXOPTS =
SPHINXDIRS = .
DOCS_THEME =
DOCS_CSS =
-_SPHINXDIRS = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
-SPHINX_CONF = conf.py
+RUSTDOC =
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode -no-shell-escape
+PYTHONPYCACHEPREFIX ?= $(abspath $(BUILDDIR)/__pycache__)
+
+# Wrapper for sphinx-build
+
+BUILD_WRAPPER = $(srctree)/tools/docs/sphinx-build-wrapper
+
# For denylisting "variable font" files
# Can be overridden by setting as an env variable
FONTS_CONF_DENY_VF ?= $(HOME)/deny-vf
-ifeq ($(findstring 1, $(KBUILD_VERBOSE)),)
-SPHINXOPTS += "-q"
-endif
-
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
@@ -46,141 +47,46 @@ ifeq ($(HAVE_SPHINX),0)
.DEFAULT:
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
@echo
- @$(srctree)/scripts/sphinx-pre-install
+ @$(srctree)/tools/docs/sphinx-pre-install
@echo " SKIP Sphinx $@ target."
else # HAVE_SPHINX
-# User-friendly check for pdflatex and latexmk
-HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
-HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
-
-ifeq ($(HAVE_LATEXMK),1)
- PDFLATEX := latexmk -$(PDFLATEX)
-endif #HAVE_LATEXMK
-
-# Internal variables.
-PAPEROPT_a4 = -D latex_elements.papersize=a4paper
-PAPEROPT_letter = -D latex_elements.papersize=letterpaper
-ALLSPHINXOPTS = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
-ALLSPHINXOPTS += $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
-ifneq ($(wildcard $(srctree)/.config),)
-ifeq ($(CONFIG_RUST),y)
- # Let Sphinx know we will include rustdoc
- ALLSPHINXOPTS += -t rustdoc
-endif
-endif
-# the i18n builder cannot share the environment and doctrees with the others
-I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+# Common documentation targets
+htmldocs mandocs infodocs texinfodocs latexdocs epubdocs xmldocs pdfdocs linkcheckdocs:
+ $(Q)PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
+ $(srctree)/tools/docs/sphinx-pre-install --version-check
+ +$(Q)PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
+ $(PYTHON3) $(BUILD_WRAPPER) $@ \
+ --sphinxdirs="$(SPHINXDIRS)" $(RUSTDOC) \
+ --builddir="$(BUILDDIR)" --deny-vf=$(FONTS_CONF_DENY_VF) \
+ --theme=$(DOCS_THEME) --css=$(DOCS_CSS) --paper=$(PAPER)
-# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
-loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
-# $2 sphinx builder e.g. "html"
-# $3 name of the build subfolder / e.g. "userspace-api/media", used as:
-# * dest folder relative to $(BUILDDIR) and
-# * cache folder relative to $(BUILDDIR)/.doctrees
-# $4 dest subfolder e.g. "man" for man pages at userspace-api/media/man
-# $5 reST source folder relative to $(src),
-# e.g. "userspace-api/media" for the linux-tv book-set at ./Documentation/userspace-api/media
-
-PYTHONPYCACHEPREFIX ?= $(abspath $(BUILDDIR)/__pycache__)
-
-quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
- cmd_sphinx = \
- PYTHONPYCACHEPREFIX="$(PYTHONPYCACHEPREFIX)" \
- BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(src)/$5/$(SPHINX_CONF)) \
- $(PYTHON3) $(srctree)/scripts/jobserver-exec \
- $(CONFIG_SHELL) $(srctree)/Documentation/sphinx/parallel-wrapper.sh \
- $(SPHINXBUILD) \
- -b $2 \
- -c $(abspath $(src)) \
- -d $(abspath $(BUILDDIR)/.doctrees/$3) \
- -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
- $(ALLSPHINXOPTS) \
- $(abspath $(src)/$5) \
- $(abspath $(BUILDDIR)/$3/$4) && \
- if [ "x$(DOCS_CSS)" != "x" ]; then \
- cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
- fi
-
-htmldocs:
- @$(srctree)/scripts/sphinx-pre-install --version-check
- @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
-
-htmldocs-redirects: $(srctree)/Documentation/.renames.txt
- @tools/docs/gen-redirects.py --output $(BUILDDIR) < $<
-
-# If Rust support is available and .config exists, add rustdoc generated contents.
-# If there are any, the errors from this make rustdoc will be displayed but
-# won't stop the execution of htmldocs
-
-ifneq ($(wildcard $(srctree)/.config),)
-ifeq ($(CONFIG_RUST),y)
- $(Q)$(MAKE) rustdoc || true
-endif
endif
-texinfodocs:
- @$(srctree)/scripts/sphinx-pre-install --version-check
- @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,texinfo,$(var),texinfo,$(var)))
-
-# Note: the 'info' Make target is generated by sphinx itself when
-# running the texinfodocs target define above.
-infodocs: texinfodocs
- $(MAKE) -C $(BUILDDIR)/texinfo info
-
-linkcheckdocs:
- @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
-
-latexdocs:
- @$(srctree)/scripts/sphinx-pre-install --version-check
- @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
-
-ifeq ($(HAVE_PDFLATEX),0)
-
-pdfdocs:
- $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
- @echo " SKIP Sphinx $@ target."
-
-else # HAVE_PDFLATEX
-
-pdfdocs: DENY_VF = XDG_CONFIG_HOME=$(FONTS_CONF_DENY_VF)
-pdfdocs: latexdocs
- @$(srctree)/scripts/sphinx-pre-install --version-check
- $(foreach var,$(SPHINXDIRS), \
- $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" $(DENY_VF) -C $(BUILDDIR)/$(var)/latex || sh $(srctree)/scripts/check-variable-fonts.sh || exit; \
- mkdir -p $(BUILDDIR)/$(var)/pdf; \
- mv $(subst .tex,.pdf,$(wildcard $(BUILDDIR)/$(var)/latex/*.tex)) $(BUILDDIR)/$(var)/pdf/; \
- )
-
-endif # HAVE_PDFLATEX
-
-epubdocs:
- @$(srctree)/scripts/sphinx-pre-install --version-check
- @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
-
-xmldocs:
- @$(srctree)/scripts/sphinx-pre-install --version-check
- @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
-
-endif # HAVE_SPHINX
-
# The following targets are independent of HAVE_SPHINX, and the rules should
# work or silently pass without Sphinx.
+htmldocs-redirects: $(srctree)/Documentation/.renames.txt
+ @tools/docs/gen-redirects.py --output $(BUILDDIR) < $<
+
refcheckdocs:
- $(Q)cd $(srctree);scripts/documentation-file-ref-check
+ $(Q)cd $(srctree); tools/docs/documentation-file-ref-check
cleandocs:
$(Q)rm -rf $(BUILDDIR)
+# Used only on help
+_SPHINXDIRS = $(shell printf "%s\n" $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)) | sort -f)
+
dochelp:
@echo ' Linux kernel internal documentation in different formats from ReST:'
@echo ' htmldocs - HTML'
@echo ' htmldocs-redirects - generate HTML redirects for moved pages'
@echo ' texinfodocs - Texinfo'
@echo ' infodocs - Info'
+ @echo ' mandocs - Man pages'
@echo ' latexdocs - LaTeX'
@echo ' pdfdocs - PDF'
@echo ' epubdocs - EPUB'
@@ -192,13 +98,17 @@ dochelp:
@echo ' cleandocs - clean all generated files'
@echo
@echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
- @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
- @echo
- @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
- @echo ' configuration. This is e.g. useful to build with nit-picking config.'
+ @echo ' top level values for SPHINXDIRS are: $(_SPHINXDIRS)'
+ @echo ' you may also use a subdirectory like SPHINXDIRS=userspace-api/media,'
+ @echo ' provided that there is an index.rst file at the subdirectory.'
@echo
@echo ' make DOCS_THEME={sphinx-theme} selects a different Sphinx theme.'
@echo
@echo ' make DOCS_CSS={a .css file} adds a DOCS_CSS override file for html/epub output.'
@echo
+ @echo ' make PAPER={a4|letter} Specifies the paper size used for LaTeX/PDF output.'
+ @echo
+ @echo ' make FONTS_CONF_DENY_VF={path} sets a deny list to block variable Noto CJK fonts'
+ @echo ' for PDF build. See tools/lib/python/kdoc/latex_fonts.py for more details'
+ @echo
@echo ' Default location for the generated documents is Documentation/output'
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index f24b3c0b9b0d..ba417a08b93d 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2637,15 +2637,16 @@ synchronize_srcu() for some other domain ``ss1``, and if an
that was held across as ``ss``-domain synchronize_srcu(), deadlock
would again be possible. Such a deadlock cycle could extend across an
arbitrarily large number of different SRCU domains. Again, with great
-power comes great responsibility.
+power comes great responsibility, though lockdep is now able to detect
+this sort of deadlock.
-Unlike the other RCU flavors, SRCU read-side critical sections can run
-on idle and even offline CPUs. This ability requires that
-srcu_read_lock() and srcu_read_unlock() contain memory barriers,
-which means that SRCU readers will run a bit slower than would RCU
-readers. It also motivates the smp_mb__after_srcu_read_unlock() API,
-which, in combination with srcu_read_unlock(), guarantees a full
-memory barrier.
+Unlike the other RCU flavors, SRCU read-side critical sections can run on
+idle and even offline CPUs, with the exception of srcu_read_lock_fast()
+and friends. This ability requires that srcu_read_lock() and
+srcu_read_unlock() contain memory barriers, which means that SRCU
+readers will run a bit slower than would RCU readers. It also motivates
+the smp_mb__after_srcu_read_unlock() API, which, in combination with
+srcu_read_unlock(), guarantees a full memory barrier.
Also unlike other RCU flavors, synchronize_srcu() may **not** be
invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
@@ -2681,15 +2682,15 @@ run some tests first. SRCU just might need a few adjustment to deal with
that sort of load. Of course, your mileage may vary based on the speed
of your CPUs and the size of your memory.
-The `SRCU
-API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
+The `SRCU API
+<https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
includes srcu_read_lock(), srcu_read_unlock(),
-srcu_dereference(), srcu_dereference_check(),
-synchronize_srcu(), synchronize_srcu_expedited(),
-call_srcu(), srcu_barrier(), and srcu_read_lock_held(). It
-also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and
-init_srcu_struct() APIs for defining and initializing
-``srcu_struct`` structures.
+srcu_dereference(), srcu_dereference_check(), synchronize_srcu(),
+synchronize_srcu_expedited(), call_srcu(), srcu_barrier(),
+and srcu_read_lock_held(). It also includes DEFINE_SRCU(),
+DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(), DEFINE_STATIC_SRCU_FAST(),
+init_srcu_struct(), and init_srcu_struct_fast() APIs for defining and
+initializing ``srcu_struct`` structures.
More recently, the SRCU API has added polling interfaces:
diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst
index c9bfb2b218e5..4b30f701225f 100644
--- a/Documentation/RCU/checklist.rst
+++ b/Documentation/RCU/checklist.rst
@@ -417,11 +417,13 @@ over a rather long period of time, but improvements are always welcome!
you should be using RCU rather than SRCU, because RCU is almost
always faster and easier to use than is SRCU.
- Also unlike other forms of RCU, explicit initialization and
- cleanup is required either at build time via DEFINE_SRCU()
- or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
- and cleanup_srcu_struct(). These last two are passed a
- "struct srcu_struct" that defines the scope of a given
+ Also unlike other forms of RCU, explicit initialization
+ and cleanup is required either at build time via
+ DEFINE_SRCU(), DEFINE_STATIC_SRCU(), DEFINE_SRCU_FAST(),
+ or DEFINE_STATIC_SRCU_FAST() or at runtime via either
+ init_srcu_struct() or init_srcu_struct_fast() and
+ cleanup_srcu_struct(). These last three are passed a
+ `struct srcu_struct` that defines the scope of a given
SRCU domain. Once initialized, the srcu_struct is passed
to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
synchronize_srcu_expedited(), and call_srcu(). A given
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index cf0b0ac9f463..a1582bd653d1 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -1227,7 +1227,10 @@ SRCU: Initialization/cleanup/ordering::
DEFINE_SRCU
DEFINE_STATIC_SRCU
+ DEFINE_SRCU_FAST // for srcu_read_lock_fast() and friends
+ DEFINE_STATIC_SRCU_FAST // for srcu_read_lock_fast() and friends
init_srcu_struct
+ init_srcu_struct_fast
cleanup_srcu_struct
smp_mb__after_srcu_read_unlock
diff --git a/Documentation/accounting/taskstats.rst b/Documentation/accounting/taskstats.rst
index 2a28b7f55c10..173c1e7bf5ef 100644
--- a/Documentation/accounting/taskstats.rst
+++ b/Documentation/accounting/taskstats.rst
@@ -76,41 +76,43 @@ The messages are in the format::
The taskstats payload is one of the following three kinds:
1. Commands: Sent from user to kernel. Commands to get data on
-a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
-containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
-the task/process for which userspace wants statistics.
-
-Commands to register/deregister interest in exit data from a set of cpus
-consist of one attribute, of type
-TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
-attribute payload. The cpumask is specified as an ascii string of
-comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
-the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
-in cpus before closing the listening socket, the kernel cleans up its interest
-set over time. However, for the sake of efficiency, an explicit deregistration
-is advisable.
+ a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
+ containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
+ the task/process for which userspace wants statistics.
+
+ Commands to register/deregister interest in exit data from a set of cpus
+ consist of one attribute, of type
+ TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
+ attribute payload. The cpumask is specified as an ascii string of
+ comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
+ the cpumask would be "1-3,5,7-8". If userspace forgets to deregister
+ interest in cpus before closing the listening socket, the kernel cleans up
+ its interest set over time. However, for the sake of efficiency, an explicit
+ deregistration is advisable.
2. Response for a command: sent from the kernel in response to a userspace
-command. The payload is a series of three attributes of type:
+ command. The payload is a series of three attributes of type:
-a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
-a pid/tgid will be followed by some stats.
+ a) TASKSTATS_TYPE_AGGR_PID/TGID: attribute containing no payload but
+ indicates a pid/tgid will be followed by some stats.
-b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
-are being returned.
+ b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose
+ stats are being returned.
-c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
-same structure is used for both per-pid and per-tgid stats.
+ c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
+ same structure is used for both per-pid and per-tgid stats.
3. New message sent by kernel whenever a task exits. The payload consists of a
series of attributes of the following type:
-a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
-b) TASKSTATS_TYPE_PID: contains exiting task's pid
-c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
-d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
-e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
-f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
+ a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
+ b) TASKSTATS_TYPE_PID: contains exiting task's pid
+ c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
+ d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be
+ tgid+stats
+ e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
+ f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's
+ process
per-tgid stats
diff --git a/Documentation/admin-guide/LSM/Smack.rst b/Documentation/admin-guide/LSM/Smack.rst
index 6d44f4fdbf59..c5ed775f2d10 100644
--- a/Documentation/admin-guide/LSM/Smack.rst
+++ b/Documentation/admin-guide/LSM/Smack.rst
@@ -601,10 +601,15 @@ specification.
Task Attribute
~~~~~~~~~~~~~~
-The Smack label of a process can be read from /proc/<pid>/attr/current. A
-process can read its own Smack label from /proc/self/attr/current. A
+The Smack label of a process can be read from ``/proc/<pid>/attr/current``. A
+process can read its own Smack label from ``/proc/self/attr/current``. A
privileged process can change its own Smack label by writing to
-/proc/self/attr/current but not the label of another process.
+``/proc/self/attr/current`` but not the label of another process.
+
+Format of writing is : only the label or the label followed by one of the
+3 trailers: ``\n`` (by common agreement for ``/proc/...`` interfaces),
+``\0`` (because some applications incorrectly include it),
+``\n\0`` (because we think some applications may incorrectly include it).
File Attribute
~~~~~~~~~~~~~~
@@ -696,6 +701,11 @@ sockets.
A privileged program may set this to match the label of another
task with which it hopes to communicate.
+UNIX domain socket (UDS) with a BSD address functions both as a file in a
+filesystem and as a socket. As a file, it carries the SMACK64 attribute. This
+attribute is not involved in Smack security enforcement and is immutably
+assigned the label "*".
+
Smack Netlabel Exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index dc7088451f9d..a756d8158531 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -95,7 +95,20 @@ languages when these scripts are invoked by passing these program files
to the interpreter. This is because the way interpreters execute these
files; the scripts themselves are not evaluated as executable code
through one of IPE's hooks, but they are merely text files that are read
-(as opposed to compiled executables) [#interpreters]_.
+(as opposed to compiled executables). However, with the introduction of the
+``AT_EXECVE_CHECK`` flag (:doc:`AT_EXECVE_CHECK </userspace-api/check_exec>`),
+interpreters can use it to signal the kernel that a script file will be executed,
+and request the kernel to perform LSM security checks on it.
+
+IPE's EXECUTE operation enforcement differs between compiled executables and
+interpreted scripts: For compiled executables, enforcement is triggered
+automatically by the kernel during ``execve()``, ``execveat()``, ``mmap()``
+and ``mprotect()`` syscalls when loading executable content. For interpreted
+scripts, enforcement requires explicit interpreter integration using
+``execveat()`` with ``AT_EXECVE_CHECK`` flag. Unlike exec syscalls that IPE
+intercepts during the execution process, this mechanism needs the interpreter
+to take the initiative, and existing interpreters won't be automatically
+supported unless the signal call is added.
Threat Model
------------
@@ -806,8 +819,6 @@ A:
.. [#digest_cache_lsm] https://lore.kernel.org/lkml/20240415142436.2545003-1-roberto.sassu@huaweicloud.com/
-.. [#interpreters] There is `some interest in solving this issue <https://lore.kernel.org/lkml/20220321161557.495388-1-mic@digikod.net/>`_.
-
.. [#devdoc] Please see :doc:`the design docs </security/ipe>` for more on
this topic.
diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
index 447bfde509fb..5a45db32c49b 100644
--- a/Documentation/admin-guide/RAS/main.rst
+++ b/Documentation/admin-guide/RAS/main.rst
@@ -406,24 +406,8 @@ index of the MC::
|->mc2
....
-Under each ``mcX`` directory each ``csrowX`` is again represented by a
-``csrowX``, where ``X`` is the csrow index::
-
- .../mc/mc0/
- |
- |->csrow0
- |->csrow2
- |->csrow3
- ....
-
-Notice that there is no csrow1, which indicates that csrow0 is composed
-of a single ranked DIMMs. This should also apply in both Channels, in
-order to have dual-channel mode be operational. Since both csrow2 and
-csrow3 are populated, this indicates a dual ranked set of DIMMs for
-channels 0 and 1.
-
-Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
-control and attribute files.
+Within each of the ``mcX`` directory are several EDAC control and
+attribute files.
``mcX`` directories
-------------------
@@ -569,7 +553,7 @@ this ``X`` memory module:
- Unbuffered-DDR
.. [#f5] On some systems, the memory controller doesn't have any logic
- to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.
+ to identify the memory module. On such systems, the directory is called ``rankX``.
On modern Intel memory controllers, the memory controller identifies the
memory modules directly. On such systems, the directory is called ``dimmX``.
@@ -577,126 +561,6 @@ this ``X`` memory module:
symlinks inside the sysfs mapping that are automatically created by
the sysfs subsystem. Currently, they serve no purpose.
-``csrowX`` directories
-----------------------
-
-When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``
-directories. As this API doesn't work properly for Rambus, FB-DIMMs and
-modern Intel Memory Controllers, this is being deprecated in favor of
-``dimmX`` directories.
-
-In the ``csrowX`` directories are EDAC control and attribute files for
-this ``X`` instance of csrow:
-
-
-- ``ue_count`` - Total Uncorrectable Errors count attribute file
-
- This attribute file displays the total count of uncorrectable
- errors that have occurred on this csrow. If panic_on_ue is set
- this counter will not have a chance to increment, since EDAC
- will panic the system.
-
-
-- ``ce_count`` - Total Correctable Errors count attribute file
-
- This attribute file displays the total count of correctable
- errors that have occurred on this csrow. This count is very
- important to examine. CEs provide early indications that a
- DIMM is beginning to fail. This count field should be
- monitored for non-zero values and report such information
- to the system administrator.
-
-
-- ``size_mb`` - Total memory managed by this csrow attribute file
-
- This attribute file displays, in count of megabytes, the memory
- that this csrow contains.
-
-
-- ``mem_type`` - Memory Type attribute file
-
- This attribute file will display what type of memory is currently
- on this csrow. Normally, either buffered or unbuffered memory.
- Examples:
-
- - Registered-DDR
- - Unbuffered-DDR
-
-
-- ``edac_mode`` - EDAC Mode of operation attribute file
-
- This attribute file will display what type of Error detection
- and correction is being utilized.
-
-
-- ``dev_type`` - Device type attribute file
-
- This attribute file will display what type of DRAM device is
- being utilized on this DIMM.
- Examples:
-
- - x1
- - x2
- - x4
- - x8
-
-
-- ``ch0_ce_count`` - Channel 0 CE Count attribute file
-
- This attribute file will display the count of CEs on this
- DIMM located in channel 0.
-
-
-- ``ch0_ue_count`` - Channel 0 UE Count attribute file
-
- This attribute file will display the count of UEs on this
- DIMM located in channel 0.
-
-
-- ``ch0_dimm_label`` - Channel 0 DIMM Label control file
-
-
- This control file allows this DIMM to have a label assigned
- to it. With this label in the module, when errors occur
- the output can provide the DIMM label in the system log.
- This becomes vital for panic events to isolate the
- cause of the UE event.
-
- DIMM Labels must be assigned after booting, with information
- that correctly identifies the physical slot with its
- silk screen label. This information is currently very
- motherboard specific and determination of this information
- must occur in userland at this time.
-
-
-- ``ch1_ce_count`` - Channel 1 CE Count attribute file
-
-
- This attribute file will display the count of CEs on this
- DIMM located in channel 1.
-
-
-- ``ch1_ue_count`` - Channel 1 UE Count attribute file
-
-
- This attribute file will display the count of UEs on this
- DIMM located in channel 0.
-
-
-- ``ch1_dimm_label`` - Channel 1 DIMM Label control file
-
- This control file allows this DIMM to have a label assigned
- to it. With this label in the module, when errors occur
- the output can provide the DIMM label in the system log.
- This becomes vital for panic events to isolate the
- cause of the UE event.
-
- DIMM Labels must be assigned after booting, with information
- that correctly identifies the physical slot with its
- silk screen label. This information is currently very
- motherboard specific and determination of this information
- must occur in userland at this time.
-
System Logging
--------------
diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst
index 6fdb495ac466..f71f349553e4 100644
--- a/Documentation/admin-guide/bcache.rst
+++ b/Documentation/admin-guide/bcache.rst
@@ -17,8 +17,7 @@ The latest bcache kernel code can be found from mainline Linux kernel:
It's designed around the performance characteristics of SSDs - it only allocates
in erase block sized buckets, and it uses a hybrid btree/log to track cached
extents (which can be anywhere from a single sector to the bucket size). It's
-designed to avoid random writes at all costs; it fills up an erase block
-sequentially, then issues a discard before reusing it.
+designed to avoid random writes at all costs.
Both writethrough and writeback caching are supported. Writeback defaults to
off, but can be switched on and off arbitrarily at runtime. Bcache goes to
@@ -618,19 +617,11 @@ bucket_size
cache_replacement_policy
One of either lru, fifo or random.
-discard
- Boolean; if on a discard/TRIM will be issued to each bucket before it is
- reused. Defaults to off, since SATA TRIM is an unqueued command (and thus
- slow).
-
freelist_percent
Size of the freelist as a percentage of nbuckets. Can be written to to
increase the number of buckets kept on the freelist, which lets you
artificially reduce the size of the cache at runtime. Mostly for testing
- purposes (i.e. testing how different size caches affect your hit rate), but
- since buckets are discarded when they move on to the freelist will also make
- the SSD's garbage collection easier by effectively giving it more reserved
- space.
+ purposes (i.e. testing how different size caches affect your hit rate).
io_errors
Number of errors that have occurred, decayed by io_error_halflife.
diff --git a/Documentation/admin-guide/blockdev/zoned_loop.rst b/Documentation/admin-guide/blockdev/zoned_loop.rst
index 64dcfde7450a..806adde664db 100644
--- a/Documentation/admin-guide/blockdev/zoned_loop.rst
+++ b/Documentation/admin-guide/blockdev/zoned_loop.rst
@@ -68,30 +68,43 @@ The options available for the add command can be listed by reading the
In more details, the options that can be used with the "add" command are as
follows.
-================ ===========================================================
-id Device number (the X in /dev/zloopX).
- Default: automatically assigned.
-capacity_mb Device total capacity in MiB. This is always rounded up to
- the nearest higher multiple of the zone size.
- Default: 16384 MiB (16 GiB).
-zone_size_mb Device zone size in MiB. Default: 256 MiB.
-zone_capacity_mb Device zone capacity (must always be equal to or lower than
- the zone size. Default: zone size.
-conv_zones Total number of conventioanl zones starting from sector 0.
- Default: 8.
-base_dir Path to the base directory where to create the directory
- containing the zone files of the device.
- Default=/var/local/zloop.
- The device directory containing the zone files is always
- named with the device ID. E.g. the default zone file
- directory for /dev/zloop0 is /var/local/zloop/0.
-nr_queues Number of I/O queues of the zoned block device. This value is
- always capped by the number of online CPUs
- Default: 1
-queue_depth Maximum I/O queue depth per I/O queue.
- Default: 64
-buffered_io Do buffered IOs instead of direct IOs (default: false)
-================ ===========================================================
+=================== =========================================================
+id Device number (the X in /dev/zloopX).
+ Default: automatically assigned.
+capacity_mb Device total capacity in MiB. This is always rounded up
+ to the nearest higher multiple of the zone size.
+ Default: 16384 MiB (16 GiB).
+zone_size_mb Device zone size in MiB. Default: 256 MiB.
+zone_capacity_mb Device zone capacity (must always be equal to or lower
+ than the zone size. Default: zone size.
+conv_zones Total number of conventioanl zones starting from
+ sector 0
+ Default: 8
+base_dir Path to the base directory where to create the directory
+ containing the zone files of the device.
+ Default=/var/local/zloop.
+ The device directory containing the zone files is always
+ named with the device ID. E.g. the default zone file
+ directory for /dev/zloop0 is /var/local/zloop/0.
+nr_queues Number of I/O queues of the zoned block device. This
+ value is always capped by the number of online CPUs
+ Default: 1
+queue_depth Maximum I/O queue depth per I/O queue.
+ Default: 64
+buffered_io Do buffered IOs instead of direct IOs (default: false)
+zone_append Enable or disable a zloop device native zone append
+ support.
+ Default: 1 (enabled).
+ If native zone append support is disabled, the block layer
+ will emulate this operation using regular write
+ operations.
+ordered_zone_append Enable zloop mitigation of zone append reordering.
+ Default: disabled.
+ This is useful for testing file systems file data mapping
+ (extents), as when enabled, this can significantly reduce
+ the number of data extents needed to for a file data
+ mapping.
+=================== =========================================================
3) Deleting a Zoned Device
--------------------------
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 0e6c67ac585a..4c072e85acdf 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -53,7 +53,8 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-2. Memory
5-2-1. Memory Interface Files
5-2-2. Usage Guidelines
- 5-2-3. Memory Ownership
+ 5-2-3. Reclaim Protection
+ 5-2-4. Memory Ownership
5-3. IO
5-3-1. IO Interface Files
5-3-2. Writeback
@@ -1317,7 +1318,7 @@ PAGE_SIZE multiple when read back.
smaller overages.
Effective min boundary is limited by memory.min values of
- all ancestor cgroups. If there is memory.min overcommitment
+ ancestor cgroups. If there is memory.min overcommitment
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to its
@@ -1326,9 +1327,6 @@ PAGE_SIZE multiple when read back.
Putting more memory than generally available under this
protection is discouraged and may lead to constant OOMs.
- If a memory cgroup is not populated with processes,
- its memory.min is ignored.
-
memory.low
A read-write single value file which exists on non-root
cgroups. The default is "0".
@@ -1343,7 +1341,7 @@ PAGE_SIZE multiple when read back.
smaller overages.
Effective low boundary is limited by memory.low values of
- all ancestor cgroups. If there is memory.low overcommitment
+ ancestor cgroups. If there is memory.low overcommitment
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to its
@@ -1934,6 +1932,27 @@ memory - is necessary to determine whether a workload needs more
memory; unfortunately, memory pressure monitoring mechanism isn't
implemented yet.
+Reclaim Protection
+~~~~~~~~~~~~~~~~~~
+
+The protection configured with "memory.low" or "memory.min" applies relatively
+to the target of the reclaim (i.e. any of memory cgroup limits, proactive
+memory.reclaim or global reclaim apparently located in the root cgroup).
+The protection value configured for B applies unchanged to the reclaim
+targeting A (i.e. caused by competition with the sibling E)::
+
+ root - ... - A - B - C
+ \ ` D
+ ` E
+
+When the reclaim targets ancestors of A, the effective protection of B is
+capped by the protection value configured for A (and any other intermediate
+ancestors between A and the target).
+
+To express indifference about relative sibling protection, it is suggested to
+use memory_recursiveprot. Configuring all descendants of a parent with finite
+protection to "max" works but it may unnecessarily skew memory.events:low
+field.
Memory Ownership
~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/efi-stub.rst b/Documentation/admin-guide/efi-stub.rst
index 090f3a185e18..f8e7407698bd 100644
--- a/Documentation/admin-guide/efi-stub.rst
+++ b/Documentation/admin-guide/efi-stub.rst
@@ -79,6 +79,9 @@ because the image we're executing is interpreted by the EFI shell,
which understands relative paths, whereas the rest of the command line
is passed to bzImage.efi.
+.. hint::
+ It is also possible to provide an initrd using a Linux-specific UEFI
+ protocol at boot time. See :ref:`pe-coff-entry-point` for details.
The "dtb=" option
-----------------
diff --git a/Documentation/admin-guide/hw-vuln/l1d_flush.rst b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
index 210020bc3f56..35dc25159b28 100644
--- a/Documentation/admin-guide/hw-vuln/l1d_flush.rst
+++ b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
@@ -31,7 +31,7 @@ specifically opt into the feature to enable it.
Mitigation
----------
-When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
+When PR_SPEC_L1D_FLUSH is enabled for a task a flush of the L1D cache is
performed when the task is scheduled out and the incoming task belongs to a
different process and therefore to a different address space.
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 991f12adef8d..4bb8549bee82 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -406,7 +406,7 @@ The possible values in this file are:
- Single threaded indirect branch prediction (STIBP) status for protection
between different hyper threads. This feature can be controlled through
- prctl per process, or through kernel command line options. This is x86
+ prctl per process, or through kernel command line options. This is an x86
only feature. For more details see below.
==================== ========================================================
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 7bf8cc7df6b5..02a725536cc5 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -110,102 +110,7 @@ The parameters listed below are only valid if certain kernel build options
were enabled and if respective hardware is present. This list should be kept
in alphabetical order. The text in square brackets at the beginning
of each description states the restrictions within which a parameter
-is applicable::
-
- ACPI ACPI support is enabled.
- AGP AGP (Accelerated Graphics Port) is enabled.
- ALSA ALSA sound support is enabled.
- APIC APIC support is enabled.
- APM Advanced Power Management support is enabled.
- APPARMOR AppArmor support is enabled.
- ARM ARM architecture is enabled.
- ARM64 ARM64 architecture is enabled.
- AX25 Appropriate AX.25 support is enabled.
- CLK Common clock infrastructure is enabled.
- CMA Contiguous Memory Area support is enabled.
- DRM Direct Rendering Management support is enabled.
- DYNAMIC_DEBUG Build in debug messages and enable them at runtime
- EARLY Parameter processed too early to be embedded in initrd.
- EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
- EFI EFI Partitioning (GPT) is enabled
- EVM Extended Verification Module
- FB The frame buffer device is enabled.
- FTRACE Function tracing enabled.
- GCOV GCOV profiling is enabled.
- HIBERNATION HIBERNATION is enabled.
- HW Appropriate hardware is enabled.
- HYPER_V HYPERV support is enabled.
- IMA Integrity measurement architecture is enabled.
- IP_PNP IP DHCP, BOOTP, or RARP is enabled.
- IPV6 IPv6 support is enabled.
- ISAPNP ISA PnP code is enabled.
- ISDN Appropriate ISDN support is enabled.
- ISOL CPU Isolation is enabled.
- JOY Appropriate joystick support is enabled.
- KGDB Kernel debugger support is enabled.
- KVM Kernel Virtual Machine support is enabled.
- LIBATA Libata driver is enabled
- LOONGARCH LoongArch architecture is enabled.
- LOOP Loopback device support is enabled.
- LP Printer support is enabled.
- M68k M68k architecture is enabled.
- These options have more detailed description inside of
- Documentation/arch/m68k/kernel-options.rst.
- MDA MDA console support is enabled.
- MIPS MIPS architecture is enabled.
- MOUSE Appropriate mouse support is enabled.
- MSI Message Signaled Interrupts (PCI).
- MTD MTD (Memory Technology Device) support is enabled.
- NET Appropriate network support is enabled.
- NFS Appropriate NFS support is enabled.
- NUMA NUMA support is enabled.
- OF Devicetree is enabled.
- PARISC The PA-RISC architecture is enabled.
- PCI PCI bus support is enabled.
- PCIE PCI Express support is enabled.
- PCMCIA The PCMCIA subsystem is enabled.
- PNP Plug & Play support is enabled.
- PPC PowerPC architecture is enabled.
- PPT Parallel port support is enabled.
- PS2 Appropriate PS/2 support is enabled.
- PV_OPS A paravirtualized kernel is enabled.
- RAM RAM disk support is enabled.
- RDT Intel Resource Director Technology.
- RISCV RISCV architecture is enabled.
- S390 S390 architecture is enabled.
- SCSI Appropriate SCSI support is enabled.
- A lot of drivers have their options described inside
- the Documentation/scsi/ sub-directory.
- SDW SoundWire support is enabled.
- SECURITY Different security models are enabled.
- SELINUX SELinux support is enabled.
- SERIAL Serial support is enabled.
- SH SuperH architecture is enabled.
- SMP The kernel is an SMP kernel.
- SPARC Sparc architecture is enabled.
- SUSPEND System suspend states are enabled.
- SWSUSP Software suspend (hibernation) is enabled.
- TPM TPM drivers are enabled.
- UMS USB Mass Storage support is enabled.
- USB USB support is enabled.
- USBHID USB Human Interface Device support is enabled.
- V4L Video For Linux support is enabled.
- VGA The VGA console has been enabled.
- VMMIO Driver for memory mapped virtio devices is enabled.
- VT Virtual terminal support is enabled.
- WDT Watchdog support is enabled.
- X86-32 X86-32, aka i386 architecture is enabled.
- X86-64 X86-64 architecture is enabled.
- X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
- X86_UV SGI UV support is enabled.
- XEN Xen support is enabled
- XTENSA xtensa architecture is enabled.
-
-In addition, the following text indicates that the option::
-
- BOOT Is a boot loader parameter.
- BUGS= Relates to possible processor bugs on the said processor.
- KNL Is a kernel start-up parameter.
+is applicable.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e..1e89d122f084 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1,3 +1,101 @@
+ ACPI ACPI support is enabled.
+ AGP AGP (Accelerated Graphics Port) is enabled.
+ ALSA ALSA sound support is enabled.
+ APIC APIC support is enabled.
+ APM Advanced Power Management support is enabled.
+ APPARMOR AppArmor support is enabled.
+ ARM ARM architecture is enabled.
+ ARM64 ARM64 architecture is enabled.
+ AX25 Appropriate AX.25 support is enabled.
+ CLK Common clock infrastructure is enabled.
+ CMA Contiguous Memory Area support is enabled.
+ DRM Direct Rendering Management support is enabled.
+ DYNAMIC_DEBUG Build in debug messages and enable them at runtime
+ EARLY Parameter processed too early to be embedded in initrd.
+ EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
+ EFI EFI Partitioning (GPT) is enabled
+ EVM Extended Verification Module
+ FB The frame buffer device is enabled.
+ FTRACE Function tracing enabled.
+ GCOV GCOV profiling is enabled.
+ HIBERNATION HIBERNATION is enabled.
+ HW Appropriate hardware is enabled.
+ HYPER_V HYPERV support is enabled.
+ IMA Integrity measurement architecture is enabled.
+ IP_PNP IP DHCP, BOOTP, or RARP is enabled.
+ IPV6 IPv6 support is enabled.
+ ISAPNP ISA PnP code is enabled.
+ ISDN Appropriate ISDN support is enabled.
+ ISOL CPU Isolation is enabled.
+ JOY Appropriate joystick support is enabled.
+ KGDB Kernel debugger support is enabled.
+ KVM Kernel Virtual Machine support is enabled.
+ LIBATA Libata driver is enabled
+ LOONGARCH LoongArch architecture is enabled.
+ LOOP Loopback device support is enabled.
+ LP Printer support is enabled.
+ M68k M68k architecture is enabled.
+ These options have more detailed description inside of
+ Documentation/arch/m68k/kernel-options.rst.
+ MDA MDA console support is enabled.
+ MIPS MIPS architecture is enabled.
+ MOUSE Appropriate mouse support is enabled.
+ MSI Message Signaled Interrupts (PCI).
+ MTD MTD (Memory Technology Device) support is enabled.
+ NET Appropriate network support is enabled.
+ NFS Appropriate NFS support is enabled.
+ NUMA NUMA support is enabled.
+ OF Devicetree is enabled.
+ PARISC The PA-RISC architecture is enabled.
+ PCI PCI bus support is enabled.
+ PCIE PCI Express support is enabled.
+ PCMCIA The PCMCIA subsystem is enabled.
+ PNP Plug & Play support is enabled.
+ PPC PowerPC architecture is enabled.
+ PPT Parallel port support is enabled.
+ PS2 Appropriate PS/2 support is enabled.
+ PV_OPS A paravirtualized kernel is enabled.
+ RAM RAM disk support is enabled.
+ RDT Intel Resource Director Technology.
+ RISCV RISCV architecture is enabled.
+ S390 S390 architecture is enabled.
+ SCSI Appropriate SCSI support is enabled.
+ A lot of drivers have their options described inside
+ the Documentation/scsi/ sub-directory.
+ SDW SoundWire support is enabled.
+ SECURITY Different security models are enabled.
+ SELINUX SELinux support is enabled.
+ SERIAL Serial support is enabled.
+ SH SuperH architecture is enabled.
+ SMP The kernel is an SMP kernel.
+ SPARC Sparc architecture is enabled.
+ SUSPEND System suspend states are enabled.
+ SWSUSP Software suspend (hibernation) is enabled.
+ TPM TPM drivers are enabled.
+ UMS USB Mass Storage support is enabled.
+ USB USB support is enabled.
+ USBHID USB Human Interface Device support is enabled.
+ V4L Video For Linux support is enabled.
+ VGA The VGA console has been enabled.
+ VMMIO Driver for memory mapped virtio devices is enabled.
+ VT Virtual terminal support is enabled.
+ WDT Watchdog support is enabled.
+ X86-32 X86-32, aka i386 architecture is enabled.
+ X86-64 X86-64 architecture is enabled.
+ X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
+ X86_UV SGI UV support is enabled.
+ XEN Xen support is enabled
+ XTENSA xtensa architecture is enabled.
+
+In addition, the following text indicates that the option
+
+ BOOT Is a boot loader parameter.
+ BUGS= Relates to possible processor bugs on the said processor.
+ KNL Is a kernel start-up parameter.
+
+
+Kernel parameters
+
accept_memory= [MM]
Format: { eager | lazy }
default: lazy
@@ -1907,6 +2005,16 @@
/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
is set. Default value is 5.
+ hibernate_compression_threads=
+ [HIBERNATION]
+ Set the number of threads used for compressing or decompressing
+ hibernation images.
+
+ Format: <integer>
+ Default: 3
+ Minimum: 1
+ Example: hibernate_compression_threads=4
+
highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact
size of <nn>. This works even on boxes that have no
highmem otherwise. This also works to reduce highmem
@@ -6207,7 +6315,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec, abmc.
+ mba, smba, bmec, abmc, sdciae.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
@@ -6404,7 +6512,7 @@
that don't.
off - no mitigation
- auto - automatically select a migitation
+ auto - automatically select a mitigation
auto,nosmt - automatically select a mitigation,
disabling SMT if necessary for
the full mitigation (only on Zen1
@@ -6500,6 +6608,10 @@
Memory area to be used by remote processor image,
managed by CMA.
+ rseq_debug= [KNL] Enable or disable restartable sequence
+ debug mode. Defaults to CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE.
+ Format: <bool>
+
rt_group_sched= [KNL] Enable or disable SCHED_RR/FIFO group scheduling
when CONFIG_RT_GROUP_SCHED=y. Defaults to
!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.
@@ -7150,7 +7262,7 @@
limit. Default value is 8191 pools.
stacktrace [FTRACE]
- Enabled the stack tracer on boot up.
+ Enable the stack tracer on boot up.
stacktrace_filter=[function-list]
[FTRACE] Limit the functions that the stack tracer
diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index deed823eab01..dc7eab191caa 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -238,6 +238,16 @@ All md devices contain:
the number of devices in a raid4/5/6, or to support external
metadata formats which mandate such clipping.
+ logical_block_size
+ Configure the array's logical block size in bytes. This attribute
+ is only supported for 1.x meta. Write the value before starting
+ array. The final array LBS uses the maximum between this
+ configuration and LBS of all combined devices. Note that
+ LBS cannot exceed PAGE_SIZE before RAID supports folio.
+ WARNING: Arrays created on new kernel cannot be assembled at old
+ kernel due to padding check, Set module parameter 'check_new_feature'
+ to false to bypass, but data loss may occur.
+
reshape_position
This is either ``none`` or a sector number within the devices of
the array where ``reshape`` is up to. If this is set, the three
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 0c090b076224..be4c1120e3f0 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -580,6 +580,15 @@ the given CPU as the upper limit for the exit latency of the idle states that
they are allowed to select for that CPU. They should never select any idle
states with exit latency beyond that limit.
+While the above CPU QoS constraints apply to CPU idle time management, user
+space may also request a CPU system wakeup latency QoS limit, via the
+`cpu_wakeup_latency` file. This QoS constraint is respected when selecting a
+suitable idle state for the CPUs, while entering the system-wide suspend-to-idle
+sleep state, but also to the regular CPU idle time management.
+
+Note that, the management of the `cpu_wakeup_latency` file works according to
+the 'cpu_dma_latency' file from user space point of view. Moreover, the unit
+is also microseconds.
Idle States Control Via Kernel Command Line
===========================================
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index 26e702c7016e..fde967b0c2e0 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -48,8 +48,9 @@ only way to pass early-configuration-time parameters to it is via the kernel
command line. However, its configuration can be adjusted via ``sysfs`` to a
great extent. In some configurations it even is possible to unregister it via
``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
-registered (see `below <status_attr_>`_).
+registered (see :ref:`below <status_attr>`).
+.. _operation_modes:
Operation Modes
===============
@@ -62,6 +63,8 @@ a certain performance scaling algorithm. Which of them will be in effect
depends on what kernel command line options are used and on the capabilities of
the processor.
+.. _active_mode:
+
Active Mode
-----------
@@ -94,6 +97,8 @@ Which of the P-state selection algorithms is used by default depends on the
Namely, if that option is set, the ``performance`` algorithm will be used by
default, and the other one will be used by default if it is not set.
+.. _active_mode_hwp:
+
Active Mode With HWP
~~~~~~~~~~~~~~~~~~~~
@@ -123,7 +128,7 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
internal P-state selection logic is expected to focus entirely on performance.
This will override the EPP/EPB setting coming from the ``sysfs`` interface
-(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
+(see :ref:`energy_performance_hints` below). Moreover, any attempts to change
the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
configuration will be rejected.
@@ -192,6 +197,8 @@ This is the default P-state selection algorithm if the
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
is not set.
+.. _passive_mode:
+
Passive Mode
------------
@@ -289,12 +296,12 @@ Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
the entire range of available P-states, including the whole turbo range, to the
``CPUFreq`` core and (in the passive mode) to generic scaling governors. This
generally causes turbo P-states to be set more often when ``intel_pstate`` is
-used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
-for more information).
+used relative to ACPI-based CPU performance scaling (see
+:ref:`below <acpi-cpufreq>` for more information).
Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
(even if the Configurable TDP feature is enabled in the processor), its
-``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
+``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should
work as expected in all cases (that is, if set to disable turbo P-states, it
always should prevent ``intel_pstate`` from using them).
@@ -307,12 +314,12 @@ pieces of information on it to be known, including:
* The minimum supported P-state.
- * The maximum supported `non-turbo P-state <turbo_>`_.
+ * The maximum supported :ref:`non-turbo P-state <turbo>`.
* Whether or not turbo P-states are supported at all.
- * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
- are supported).
+ * The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo
+ P-states are supported).
* The scaling formula to translate the driver's internal representation
of P-states into frequencies and the other way around.
@@ -400,10 +407,10 @@ Energy-Aware Scheduling Support
If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
-`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
+:ref:`CAS` it registers an Energy Model for the processor. This allows the
Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
-to operate in the `passive mode <Passive Mode_>`_.
+to operate in the :ref:`passive mode <passive_mode>`.
The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
based on abstract cost values and it does not include any real power numbers)
@@ -432,6 +439,8 @@ the ``energy_model`` directory in ``debugfs`` (typlically mounted on
User Space Interface in ``sysfs``
=================================
+.. _global_attributes:
+
Global Attributes
-----------------
@@ -444,8 +453,8 @@ argument is passed to the kernel in the command line.
``max_perf_pct``
Maximum P-state the driver is allowed to set in percent of the
- maximum supported performance level (the highest supported `turbo
- P-state <turbo_>`_).
+ maximum supported performance level (the highest supported :ref:`turbo
+ P-state <turbo>`).
This attribute will not be exposed if the
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@@ -453,8 +462,8 @@ argument is passed to the kernel in the command line.
``min_perf_pct``
Minimum P-state the driver is allowed to set in percent of the
- maximum supported performance level (the highest supported `turbo
- P-state <turbo_>`_).
+ maximum supported performance level (the highest supported :ref:`turbo
+ P-state <turbo>`).
This attribute will not be exposed if the
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@@ -463,18 +472,18 @@ argument is passed to the kernel in the command line.
``num_pstates``
Number of P-states supported by the processor (between 0 and 255
inclusive) including both turbo and non-turbo P-states (see
- `Turbo P-states Support`_).
+ :ref:`turbo`).
This attribute is present only if the value exposed by it is the same
for all of the CPUs in the system.
The value of this attribute is not affected by the ``no_turbo``
- setting described `below <no_turbo_attr_>`_.
+ setting described :ref:`below <no_turbo_attr>`.
This attribute is read-only.
``turbo_pct``
- Ratio of the `turbo range <turbo_>`_ size to the size of the entire
+ Ratio of the :ref:`turbo range <turbo>` size to the size of the entire
range of supported P-states, in percent.
This attribute is present only if the value exposed by it is the same
@@ -486,7 +495,7 @@ argument is passed to the kernel in the command line.
``no_turbo``
If set (equal to 1), the driver is not allowed to set any turbo P-states
- (see `Turbo P-states Support`_). If unset (equal to 0, which is the
+ (see :ref:`turbo`). If unset (equal to 0, which is the
default), turbo P-states can be set by the driver.
[Note that ``intel_pstate`` does not support the general ``boost``
attribute (supported by some other scaling drivers) which is replaced
@@ -495,11 +504,11 @@ argument is passed to the kernel in the command line.
This attribute does not affect the maximum supported frequency value
supplied to the ``CPUFreq`` core and exposed via the policy interface,
but it affects the maximum possible value of per-policy P-state limits
- (see `Interpretation of Policy Attributes`_ below for details).
+ (see :ref:`policy_attributes_interpretation` below for details).
``hwp_dynamic_boost``
This attribute is only present if ``intel_pstate`` works in the
- `active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
+ :ref:`active mode with the HWP feature enabled <active_mode_hwp>` in
the processor. If set (equal to 1), it causes the minimum P-state limit
to be increased dynamically for a short time whenever a task previously
waiting on I/O is selected to run on a given logical CPU (the purpose
@@ -514,12 +523,12 @@ argument is passed to the kernel in the command line.
Operation mode of the driver: "active", "passive" or "off".
"active"
- The driver is functional and in the `active mode
- <Active Mode_>`_.
+ The driver is functional and in the :ref:`active mode
+ <active_mode>`.
"passive"
- The driver is functional and in the `passive mode
- <Passive Mode_>`_.
+ The driver is functional and in the :ref:`passive mode
+ <passive_mode>`.
"off"
The driver is not functional (it is not registered as a scaling
@@ -547,13 +556,15 @@ argument is passed to the kernel in the command line.
attribute to "1" enables the energy-efficiency optimizations and setting
to "0" disables them.
+.. _policy_attributes_interpretation:
+
Interpretation of Policy Attributes
-----------------------------------
The interpretation of some ``CPUFreq`` policy attributes described in
Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``
as the current scaling driver and it generally depends on the driver's
-`operation mode <Operation Modes_>`_.
+:ref:`operation mode <operation_modes>`.
First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
``scaling_cur_freq`` attributes are produced by applying a processor-specific
@@ -562,9 +573,10 @@ Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
attributes are capped by the frequency corresponding to the maximum P-state that
the driver is allowed to set.
-If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
-not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
-and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
+If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver
+is not allowed to use turbo P-states, so the maximum value of
+``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum
+non-turbo P-state frequency.
Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
``scaling_min_freq`` to go down to that value if they were above it before.
However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
@@ -576,7 +588,7 @@ and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
which also is the value of ``cpuinfo_max_freq`` in either case.
Next, the following policy attributes have special meaning if
-``intel_pstate`` works in the `active mode <Active Mode_>`_:
+``intel_pstate`` works in the :ref:`active mode <active_mode>`:
``scaling_available_governors``
List of P-state selection algorithms provided by ``intel_pstate``.
@@ -597,20 +609,22 @@ processor:
Shows the base frequency of the CPU. Any frequency above this will be
in the turbo frequency range.
-The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
+The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the
same as for other scaling drivers.
Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
depends on the operation mode of the driver. Namely, it is either
-"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
-`passive mode <Passive Mode_>`_).
+"intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq"
+(in the :ref:`passive mode <passive_mode>`).
+
+.. _pstate_limits_coordination:
Coordination of P-State Limits
------------------------------
``intel_pstate`` allows P-state limits to be set in two ways: with the help of
-the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
-<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
+the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes
+<global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq``
``CPUFreq`` policy attributes. The coordination between those limits is based
on the following rules, regardless of the current operation mode of the driver:
@@ -632,17 +646,18 @@ on the following rules, regardless of the current operation mode of the driver:
3. The global and per-policy limits can be set independently.
-In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
+In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the
resulting effective values are written into hardware registers whenever the
limits change in order to request its internal P-state selection logic to always
set P-states within these limits. Otherwise, the limits are taken into account
-by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
-every time before setting a new P-state for a CPU.
+by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the
+driver every time before setting a new P-state for a CPU.
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
at all and the only way to set the limits is by using the policy attributes.
+.. _energy_performance_hints:
Energy vs Performance Hints
---------------------------
@@ -702,9 +717,9 @@ output.
On those systems each ``_PSS`` object returns a list of P-states supported by
the corresponding CPU which basically is a subset of the P-states range that can
be used by ``intel_pstate`` on the same system, with one exception: the whole
-`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By
-convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
-than the frequency of the highest non-turbo P-state listed by it, but the
+:ref:`turbo range <turbo>` is represented by one item in it (the topmost one).
+By convention, the frequency returned by ``_PSS`` for that item is greater by
+1 MHz than the frequency of the highest non-turbo P-state listed by it, but the
corresponding P-state representation (following the hardware specification)
returned for it matches the maximum supported turbo P-state (or is the
special value 255 meaning essentially "go as high as you can get").
@@ -730,18 +745,18 @@ benefit from running at turbo frequencies will be given non-turbo P-states
instead.
One more issue related to that may appear on systems supporting the
-`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
-turbo threshold. Namely, if that is not coordinated with the lists of P-states
-returned by ``_PSS`` properly, there may be more than one item corresponding to
-a turbo P-state in those lists and there may be a problem with avoiding the
-turbo range (if desirable or necessary). Usually, to avoid using turbo
-P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
-by ``_PSS``, but that is not sufficient when there are other turbo P-states in
-the list returned by it.
+:ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set
+the turbo threshold. Namely, if that is not coordinated with the lists of
+P-states returned by ``_PSS`` properly, there may be more than one item
+corresponding to a turbo P-state in those lists and there may be a problem with
+avoiding the turbo range (if desirable or necessary). Usually, to avoid using
+turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state
+listed by ``_PSS``, but that is not sufficient when there are other turbo
+P-states in the list returned by it.
Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
-`passive mode <Passive Mode_>`_, except that the number of P-states it can set
-is limited to the ones listed by the ACPI ``_PSS`` objects.
+:ref:`passive mode <passive_mode>`, except that the number of P-states it can
+set is limited to the ones listed by the ACPI ``_PSS`` objects.
Kernel Command Line Options for ``intel_pstate``
@@ -756,11 +771,11 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
processor is supported by it.
``active``
- Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
- with.
+ Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to
+ start with.
``passive``
- Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
+ Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to
start with.
``force``
@@ -793,12 +808,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
and this option has no effect.
``per_cpu_perf_limits``
- Use per-logical-CPU P-State limits (see `Coordination of P-state
- Limits`_ for details).
+ Use per-logical-CPU P-State limits (see
+ :ref:`pstate_limits_coordination` for details).
``no_cas``
- Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
- default on hybrid systems without SMT.
+ Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled
+ by default on hybrid systems without SMT.
Diagnostics and Tuning
======================
@@ -810,7 +825,7 @@ There are two static trace events that can be used for ``intel_pstate``
diagnostics. One of them is the ``cpu_frequency`` trace event generally used
by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if
-it works in the `active mode <Active Mode_>`_.
+it works in the :ref:`active mode <active_mode>`.
The following sequence of shell commands can be used to enable them and see
their output (if the kernel is generally configured to support event tracing)::
@@ -822,7 +837,7 @@ their output (if the kernel is generally configured to support event tracing)::
gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
-If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
+If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the
``cpu_frequency`` trace event will be triggered either by the ``schedutil``
scaling governor (for the policies it is attached to), or by the ``CPUFreq``
core (for the policies with other scaling governors).
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 2ef50828aff1..369a738a6819 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -212,6 +212,14 @@ mem_pcpu_rsv
Per-cpu reserved forward alloc cache size in page units. Default 1MB per CPU.
+bypass_prot_mem
+---------------
+
+Skip charging socket buffers to the global per-protocol memory
+accounting controlled by net.ipv4.tcp_mem, net.ipv4.udp_mem, etc.
+
+Default: 0 (off)
+
rmem_default
------------
@@ -347,9 +355,9 @@ skb_defer_max
-------------
Max size (in skbs) of the per-cpu list of skbs being freed
-by the cpu which allocated them. Used by TCP stack so far.
+by the cpu which allocated them.
-Default: 64
+Default: 128
optmem_max
----------
@@ -406,6 +414,23 @@ to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
If set to 1 (default), hash rethink is performed on listening socket.
If set to 0, hash rethink is not performed.
+txq_reselection_ms
+------------------
+
+Controls how often (in ms) a busy connected flow can select another tx queue.
+
+A resection is desirable when/if user thread has migrated and XPS
+would select a different queue. Same can occur without XPS
+if the flow hash has changed.
+
+But switching txq can introduce reorders, especially if the
+old queue is under high pressure. Modern TCP stacks deal
+well with reorders if they happen not too often.
+
+To disable this feature, set the value to 0.
+
+Default : 1000
+
gro_normal_batch
----------------
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index a0cc017e4424..ed1f8f1e86c5 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -186,6 +186,6 @@ More detailed explanation for tainting
18) ``N`` if an in-kernel test, such as a KUnit test, has been run.
- 19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
+ 19) ``J`` if userspace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
to use the devices debugging features. Device debugging features could
cause the device to malfunction in undefined ways.
diff --git a/Documentation/admin-guide/thermal/index.rst b/Documentation/admin-guide/thermal/index.rst
index 193b7b01a87d..e48bc0a1951b 100644
--- a/Documentation/admin-guide/thermal/index.rst
+++ b/Documentation/admin-guide/thermal/index.rst
@@ -6,3 +6,4 @@ Thermal Subsystem
:maxdepth: 1
intel_powerclamp
+ intel_thermal_throttle
diff --git a/Documentation/admin-guide/thermal/intel_thermal_throttle.rst b/Documentation/admin-guide/thermal/intel_thermal_throttle.rst
new file mode 100644
index 000000000000..f4fbf9d5a4ec
--- /dev/null
+++ b/Documentation/admin-guide/thermal/intel_thermal_throttle.rst
@@ -0,0 +1,91 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=======================================
+Intel thermal throttle events reporting
+=======================================
+
+:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+
+Introduction
+------------
+
+Intel processors have built in automatic and adaptive thermal monitoring
+mechanisms that force the processor to reduce its power consumption in order
+to operate within predetermined temperature limits.
+
+Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and
+IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, & 3D):
+System Programming Guide" for more details.
+
+In general, there are two mechanisms to control the core temperature of the
+processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)".
+
+The status of the temperature sensor that triggers the thermal monitor (TM1/TM2)
+is indicated through the "thermal status flag" and "thermal status log flag" in
+MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for
+package level.
+
+Thermal Status flag, bit 0 — When set, indicates that the processor core
+temperature is currently at the trip temperature of the thermal monitor and that
+the processor power consumption is being reduced via either TM1 or TM2, depending
+on which is enabled. When clear, the flag indicates that the core temperature is
+below the thermal monitor trip temperature. This flag is read only.
+
+Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has
+tripped since the last power-up or reset or since the last time that software
+cleared this flag. This flag is a sticky bit; once set it remains set until
+cleared by software or until a power-up or reset of the processor. The default
+state is clear.
+
+It is possible that when user reads MSR_IA32_THERM_STATUS or
+MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case,
+"Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set
+to show any previous "TM1/TM2" activation. But since it needs to be cleared by
+the software, it can't show the number of occurrences of "TM1/TM2" activations.
+
+Hence, Linux provides counters of how many times the "Thermal Status flag" was
+set. Also presents how long the "Thermal Status flag" was active in milliseconds.
+Using these counters, users can check if the performance was limited because of
+thermal events. It is recommended to read from sysfs instead of directly reading
+MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate
+control.
+
+Sysfs Interface
+---------------
+
+Thermal throttling events are presented for each CPU under
+"/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number.
+
+All these counters are read-only. They can't be reset to 0. So, they can potentially
+overflow after reaching the maximum 64 bit unsigned integer.
+
+``core_throttle_count``
+ Shows the number of times "Thermal Status flag" changed from 0 to 1 for this
+ CPU since OS boot and thermal vector is initialized. This is a 64 bit counter.
+
+``package_throttle_count``
+ Shows the number of times "Thermal Status flag" changed from 0 to 1 for the
+ package containing this CPU since OS boot and thermal vector is initialized.
+ Package status is broadcast to all CPUs; all CPUs in the package increment
+ this count. This is a 64-bit counter.
+
+``core_throttle_max_time_ms``
+ Shows the maximum amount of time for which "Thermal Status flag" has been
+ set to 1 for this CPU at the core level since OS boot and thermal vector
+ is initialized.
+
+``package_throttle_max_time_ms``
+ Shows the maximum amount of time for which "Thermal Status flag" has been
+ set to 1 for the package containing this CPU since OS boot and thermal
+ vector is initialized.
+
+``core_throttle_total_time_ms``
+ Shows the cumulative time for which "Thermal Status flag" has been
+ set to 1 for this CPU for core level since OS boot and thermal vector
+ is initialized.
+
+``package_throttle_total_time_ms``
+ Shows the cumulative time for which "Thermal Status flag" has been set
+ to 1 for the package containing this CPU since OS boot and thermal vector
+ is initialized.
diff --git a/Documentation/admin-guide/workload-tracing.rst b/Documentation/admin-guide/workload-tracing.rst
index d6313890ee41..35963491b9f1 100644
--- a/Documentation/admin-guide/workload-tracing.rst
+++ b/Documentation/admin-guide/workload-tracing.rst
@@ -196,11 +196,11 @@ Let’s checkout the latest Linux repository and build cscope database::
cscope -R -p10 # builds cscope.out database before starting browse session
cscope -d -p10 # starts browse session on cscope.out database
-Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
-enter into the browsing session. cscope by default cscope.out database.
-To get out of this mode press ctrl+d. -p option is used to specify the
-number of file path components to display. -p10 is optimal for browsing
-kernel sources.
+Note: Run "cscope -R -p10" to build the database and "cscope -d -p10" to
+enter into the browsing session. cscope by default uses the cscope.out
+database. To get out of this mode press ctrl+d. -p option is used to
+specify the number of file path components to display. -p10 is optimal
+for browsing kernel sources.
What is perf and how do we use it?
==================================
diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index e4f953839f71..26efca09aef3 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -391,13 +391,13 @@ Before jumping into the kernel, the following conditions must be met:
- SMCR_EL2.LEN must be initialised to the same value for all CPUs the
kernel will execute on.
- - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
+ - HFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
+ - HFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
+ - HFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
- - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
+ - HFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64):
diff --git a/Documentation/arch/arm64/sve.rst b/Documentation/arch/arm64/sve.rst
index 28152492c29c..a61c9d0efe4d 100644
--- a/Documentation/arch/arm64/sve.rst
+++ b/Documentation/arch/arm64/sve.rst
@@ -402,6 +402,11 @@ The regset data starts with struct user_sve_header, containing:
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
if the target was not in streaming mode.
+* On systems that do not support SVE it is permitted to use SETREGSET to
+ write SVE_PT_REGS_FPSIMD formatted data via NT_ARM_SVE, in this case the
+ vector length should be specified as 0. This allows streaming mode to be
+ disabled on systems with SME but not SVE.
+
* If any register data is provided along with SVE_PT_VL_ONEXEC then the
registers data will be interpreted with the current vector length, not
the vector length configured for use on exec.
diff --git a/Documentation/arch/s390/s390dbf.rst b/Documentation/arch/s390/s390dbf.rst
index af8bdc3629e7..aad6d88974fe 100644
--- a/Documentation/arch/s390/s390dbf.rst
+++ b/Documentation/arch/s390/s390dbf.rst
@@ -243,9 +243,8 @@ Examples:
Changing the size of debug areas
------------------------------------
-It is possible the change the size of debug areas through piping
-the number of pages to the debugfs file "pages". The resize request will
-also flush the debug areas.
+To resize a debug area, write the desired page count to the "pages" file.
+Existing data is preserved if it fits; otherwise, oldest entries are dropped.
Example:
diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
index 77e6163288db..32eea3d2807e 100644
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@@ -1431,12 +1431,34 @@ The boot loader *must* fill out the following fields in bp::
All other fields should be zero.
.. note::
- The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF
- entry point, combined with the LINUX_EFI_INITRD_MEDIA_GUID based initrd
- loading protocol (refer to [0] for an example of the bootloader side of
- this), which removes the need for any knowledge on the part of the EFI
- bootloader regarding the internal representation of boot_params or any
- requirements/limitations regarding the placement of the command line
- and ramdisk in memory, or the placement of the kernel image itself.
-
-[0] https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0
+ The EFI Handover Protocol is deprecated in favour of the ordinary PE/COFF
+ entry point described below.
+
+.. _pe-coff-entry-point:
+
+PE/COFF entry point
+===================
+
+When compiled with ``CONFIG_EFI_STUB=y``, the kernel can be executed as a
+regular PE/COFF binary. See Documentation/admin-guide/efi-stub.rst for
+implementation details.
+
+The stub loader can request the initrd via a UEFI protocol. For this to work,
+the firmware or bootloader needs to register a handle which carries
+implementations of the ``EFI_LOAD_FILE2`` protocol and the device path
+protocol exposing the ``LINUX_EFI_INITRD_MEDIA_GUID`` vendor media device path.
+In this case, a kernel booting via the EFI stub will invoke
+``LoadFile2::LoadFile()`` method on the registered protocol to instruct the
+firmware to load the initrd into a memory location chosen by the kernel/EFI
+stub.
+
+This approach removes the need for any knowledge on the part of the EFI
+bootloader regarding the internal representation of boot_params or any
+requirements/limitations regarding the placement of the command line and
+ramdisk in memory, or the placement of the kernel image itself.
+
+For sample implementations, refer to `the original u-boot implementation`_ or
+`the OVMF implementation`_.
+
+.. _the original u-boot implementation: https://github.com/u-boot/u-boot/commit/ec80b4735a593961fe701cc3a5d717d4739b0fd0
+.. _the OVMF implementation: https://github.com/tianocore/edk2/blob/1780373897f12c25075f8883e073144506441168/OvmfPkg/LinuxInitrdDynamicShellCommand/LinuxInitrdDynamicShellCommand.c
diff --git a/Documentation/bpf/libbpf/program_types.rst b/Documentation/bpf/libbpf/program_types.rst
index 218b020a2f81..3b837522834b 100644
--- a/Documentation/bpf/libbpf/program_types.rst
+++ b/Documentation/bpf/libbpf/program_types.rst
@@ -100,10 +100,26 @@ described in more detail in the footnotes.
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
+ + +----------------------------------+-----------+
| | | ``usdt+`` [#usdt]_ | |
++ + +----------------------------------+-----------+
+| | | ``usdt.s+`` [#usdt]_ | Yes |
+ +----------------------------------------+----------------------------------+-----------+
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
+ + +----------------------------------+-----------+
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
++ +----------------------------------------+----------------------------------+-----------+
+| | ``BPF_TRACE_KPROBE_SESSION`` | ``kprobe.session+`` [#kpmulti]_ | |
++ +----------------------------------------+----------------------------------+-----------+
+| | ``BPF_TRACE_UPROBE_MULTI`` | ``uprobe.multi+`` [#upmul]_ | |
++ + +----------------------------------+-----------+
+| | | ``uprobe.multi.s+`` [#upmul]_ | Yes |
++ + +----------------------------------+-----------+
+| | | ``uretprobe.multi+`` [#upmul]_ | |
++ + +----------------------------------+-----------+
+| | | ``uretprobe.multi.s+`` [#upmul]_ | Yes |
++ +----------------------------------------+----------------------------------+-----------+
+| | ``BPF_TRACE_UPROBE_SESSION`` | ``uprobe.session+`` [#upmul]_ | |
++ + +----------------------------------+-----------+
+| | | ``uprobe.session.s+`` [#upmul]_ | Yes |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
@@ -219,6 +235,8 @@ described in more detail in the footnotes.
non-negative integer.
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
+.. [#upmul] The ``uprobe.multi`` attach format is ``uprobe.multi[.s]/<path>:<function-pattern>``
+ where ``function-pattern`` supports ``*`` and ``?`` wildcards.
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
diff --git a/Documentation/bpf/map_array.rst b/Documentation/bpf/map_array.rst
index f2f51a53e8ae..fa56ff75190c 100644
--- a/Documentation/bpf/map_array.rst
+++ b/Documentation/bpf/map_array.rst
@@ -15,8 +15,9 @@ of constant size. The size of the array is defined in ``max_entries`` at
creation time. All array elements are pre-allocated and zero initialized when
created. ``BPF_MAP_TYPE_PERCPU_ARRAY`` uses a different memory region for each
CPU whereas ``BPF_MAP_TYPE_ARRAY`` uses the same memory region. The value
-stored can be of any size, however, all array elements are aligned to 8
-bytes.
+stored can be of any size for ``BPF_MAP_TYPE_ARRAY`` and not more than
+``PCPU_MIN_UNIT_SIZE`` (32 kB) for ``BPF_MAP_TYPE_PERCPU_ARRAY``. All
+array elements are aligned to 8 bytes.
Since kernel 5.5, memory mapping may be enabled for ``BPF_MAP_TYPE_ARRAY`` by
setting the flag ``BPF_F_MMAPABLE``. The map definition is page-aligned and
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 574896cca198..1ea2ae5c6276 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -18,8 +18,6 @@ import sphinx
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath("sphinx"))
-from load_config import loadConfig # pylint: disable=C0413,E0401
-
# Minimal supported version
needs_sphinx = "3.4.3"
@@ -93,8 +91,12 @@ def config_init(app, config):
# LaTeX and PDF output require a list of documents with are dependent
# of the app.srcdir. Add them here
- # When SPHINXDIRS is used, we just need to get index.rst, if it exists
+ # Handle the case where SPHINXDIRS is used
if not os.path.samefile(doctree, app.srcdir):
+ # Add a tag to mark that the build is actually a subproject
+ tags.add("subproject")
+
+ # get index.rst, if it exists
doc = os.path.basename(app.srcdir)
fname = "index"
if os.path.exists(os.path.join(app.srcdir, fname + ".rst")):
@@ -583,13 +585,6 @@ pdf_documents = [
kerneldoc_bin = "../scripts/kernel-doc.py"
kerneldoc_srctree = ".."
-# ------------------------------------------------------------------------------
-# Since loadConfig overwrites settings from the global namespace, it has to be
-# the last statement in the conf.py file
-# ------------------------------------------------------------------------------
-loadConfig(globals())
-
-
def setup(app):
"""Patterns need to be updated at init time on older Sphinx versions"""
diff --git a/Documentation/core-api/assoc_array.rst b/Documentation/core-api/assoc_array.rst
index 792bbf9939e1..19d89f92bf8d 100644
--- a/Documentation/core-api/assoc_array.rst
+++ b/Documentation/core-api/assoc_array.rst
@@ -92,18 +92,18 @@ There are two functions for dealing with the script:
void assoc_array_apply_edit(struct assoc_array_edit *edit);
-This will perform the edit functions, interpolating various write barriers
-to permit accesses under the RCU read lock to continue. The edit script
-will then be passed to ``call_rcu()`` to free it and any dead stuff it points
-to.
+ This will perform the edit functions, interpolating various write barriers
+ to permit accesses under the RCU read lock to continue. The edit script
+ will then be passed to ``call_rcu()`` to free it and any dead stuff it
+ points to.
2. Cancel an edit script::
void assoc_array_cancel_edit(struct assoc_array_edit *edit);
-This frees the edit script and all preallocated memory immediately. If
-this was for insertion, the new object is _not_ released by this function,
-but must rather be released by the caller.
+ This frees the edit script and all preallocated memory immediately. If
+ this was for insertion, the new object is *not* released by this function,
+ but must rather be released by the caller.
These functions are guaranteed not to fail.
@@ -123,43 +123,43 @@ This points to a number of methods, all of which need to be provided:
unsigned long (*get_key_chunk)(const void *index_key, int level);
-This should return a chunk of caller-supplied index key starting at the
-*bit* position given by the level argument. The level argument will be a
-multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
-``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
+ This should return a chunk of caller-supplied index key starting at the
+ *bit* position given by the level argument. The level argument will be a
+ multiple of ``ASSOC_ARRAY_KEY_CHUNK_SIZE`` and the function should return
+ ``ASSOC_ARRAY_KEY_CHUNK_SIZE bits``. No error is possible.
2. Get a chunk of an object's index key::
unsigned long (*get_object_key_chunk)(const void *object, int level);
-As the previous function, but gets its data from an object in the array
-rather than from a caller-supplied index key.
+ As the previous function, but gets its data from an object in the array
+ rather than from a caller-supplied index key.
3. See if this is the object we're looking for::
bool (*compare_object)(const void *object, const void *index_key);
-Compare the object against an index key and return ``true`` if it matches and
-``false`` if it doesn't.
+ Compare the object against an index key and return ``true`` if it matches
+ and ``false`` if it doesn't.
4. Diff the index keys of two objects::
int (*diff_objects)(const void *object, const void *index_key);
-Return the bit position at which the index key of the specified object
-differs from the given index key or -1 if they are the same.
+ Return the bit position at which the index key of the specified object
+ differs from the given index key or -1 if they are the same.
5. Free an object::
void (*free_object)(void *object);
-Free the specified object. Note that this may be called an RCU grace period
-after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may be
-necessary on module unloading.
+ Free the specified object. Note that this may be called an RCU grace period
+ after ``assoc_array_apply_edit()`` was called, so ``synchronize_rcu()`` may
+ be necessary on module unloading.
Manipulation Functions
@@ -171,7 +171,7 @@ There are a number of functions for manipulating an associative array:
void assoc_array_init(struct assoc_array *array);
-This initialises the base structure for an associative array. It can't fail.
+ This initialises the base structure for an associative array. It can't fail.
2. Insert/replace an object in an associative array::
@@ -182,21 +182,21 @@ This initialises the base structure for an associative array. It can't fail.
const void *index_key,
void *object);
-This inserts the given object into the array. Note that the least
-significant bit of the pointer must be zero as it's used to type-mark
-pointers internally.
+ This inserts the given object into the array. Note that the least
+ significant bit of the pointer must be zero as it's used to type-mark
+ pointers internally.
-If an object already exists for that key then it will be replaced with the
-new object and the old one will be freed automatically.
+ If an object already exists for that key then it will be replaced with the
+ new object and the old one will be freed automatically.
-The ``index_key`` argument should hold index key information and is
-passed to the methods in the ops table when they are called.
+ The ``index_key`` argument should hold index key information and is
+ passed to the methods in the ops table when they are called.
-This function makes no alteration to the array itself, but rather returns
-an edit script that must be applied. ``-ENOMEM`` is returned in the case of
-an out-of-memory error.
+ This function makes no alteration to the array itself, but rather returns
+ an edit script that must be applied. ``-ENOMEM`` is returned in the case of
+ an out-of-memory error.
-The caller should lock exclusively against other modifiers of the array.
+ The caller should lock exclusively against other modifiers of the array.
3. Delete an object from an associative array::
@@ -206,15 +206,15 @@ The caller should lock exclusively against other modifiers of the array.
const struct assoc_array_ops *ops,
const void *index_key);
-This deletes an object that matches the specified data from the array.
+ This deletes an object that matches the specified data from the array.
-The ``index_key`` argument should hold index key information and is
-passed to the methods in the ops table when they are called.
+ The ``index_key`` argument should hold index key information and is
+ passed to the methods in the ops table when they are called.
-This function makes no alteration to the array itself, but rather returns
-an edit script that must be applied. ``-ENOMEM`` is returned in the case of
-an out-of-memory error. ``NULL`` will be returned if the specified object is
-not found within the array.
+ This function makes no alteration to the array itself, but rather returns
+ an edit script that must be applied. ``-ENOMEM`` is returned in the case of
+ an out-of-memory error. ``NULL`` will be returned if the specified object
+ is not found within the array.
The caller should lock exclusively against other modifiers of the array.
@@ -225,14 +225,14 @@ The caller should lock exclusively against other modifiers of the array.
assoc_array_clear(struct assoc_array *array,
const struct assoc_array_ops *ops);
-This deletes all the objects from an associative array and leaves it
-completely empty.
+ This deletes all the objects from an associative array and leaves it
+ completely empty.
-This function makes no alteration to the array itself, but rather returns
-an edit script that must be applied. ``-ENOMEM`` is returned in the case of
-an out-of-memory error.
+ This function makes no alteration to the array itself, but rather returns
+ an edit script that must be applied. ``-ENOMEM`` is returned in the case of
+ an out-of-memory error.
-The caller should lock exclusively against other modifiers of the array.
+ The caller should lock exclusively against other modifiers of the array.
5. Destroy an associative array, deleting all objects::
@@ -240,14 +240,14 @@ The caller should lock exclusively against other modifiers of the array.
void assoc_array_destroy(struct assoc_array *array,
const struct assoc_array_ops *ops);
-This destroys the contents of the associative array and leaves it
-completely empty. It is not permitted for another thread to be traversing
-the array under the RCU read lock at the same time as this function is
-destroying it as no RCU deferral is performed on memory release -
-something that would require memory to be allocated.
+ This destroys the contents of the associative array and leaves it
+ completely empty. It is not permitted for another thread to be traversing
+ the array under the RCU read lock at the same time as this function is
+ destroying it as no RCU deferral is performed on memory release -
+ something that would require memory to be allocated.
-The caller should lock exclusively against other modifiers and accessors
-of the array.
+ The caller should lock exclusively against other modifiers and accessors
+ of the array.
6. Garbage collect an associative array::
@@ -257,24 +257,24 @@ of the array.
bool (*iterator)(void *object, void *iterator_data),
void *iterator_data);
-This iterates over the objects in an associative array and passes each one to
-``iterator()``. If ``iterator()`` returns ``true``, the object is kept. If it
-returns ``false``, the object will be freed. If the ``iterator()`` function
-returns ``true``, it must perform any appropriate refcount incrementing on the
-object before returning.
+ This iterates over the objects in an associative array and passes each one
+ to ``iterator()``. If ``iterator()`` returns ``true``, the object is kept.
+ If it returns ``false``, the object will be freed. If the ``iterator()``
+ function returns ``true``, it must perform any appropriate refcount
+ incrementing on the object before returning.
-The internal tree will be packed down if possible as part of the iteration
-to reduce the number of nodes in it.
+ The internal tree will be packed down if possible as part of the iteration
+ to reduce the number of nodes in it.
-The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
-ignored by the function.
+ The ``iterator_data`` is passed directly to ``iterator()`` and is otherwise
+ ignored by the function.
-The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
-enough memory.
+ The function will return ``0`` if successful and ``-ENOMEM`` if there wasn't
+ enough memory.
-It is possible for other threads to iterate over or search the array under
-the RCU read lock while this function is in progress. The caller should
-lock exclusively against other modifiers of the array.
+ It is possible for other threads to iterate over or search the array under
+ the RCU read lock while this function is in progress. The caller should
+ lock exclusively against other modifiers of the array.
Access Functions
@@ -289,19 +289,19 @@ There are two functions for accessing an associative array:
void *iterator_data),
void *iterator_data);
-This passes each object in the array to the iterator callback function.
-``iterator_data`` is private data for that function.
+ This passes each object in the array to the iterator callback function.
+ ``iterator_data`` is private data for that function.
-This may be used on an array at the same time as the array is being
-modified, provided the RCU read lock is held. Under such circumstances,
-it is possible for the iteration function to see some objects twice. If
-this is a problem, then modification should be locked against. The
-iteration algorithm should not, however, miss any objects.
+ This may be used on an array at the same time as the array is being
+ modified, provided the RCU read lock is held. Under such circumstances,
+ it is possible for the iteration function to see some objects twice. If
+ this is a problem, then modification should be locked against. The
+ iteration algorithm should not, however, miss any objects.
-The function will return ``0`` if no objects were in the array or else it will
-return the result of the last iterator function called. Iteration stops
-immediately if any call to the iteration function results in a non-zero
-return.
+ The function will return ``0`` if no objects were in the array or else it
+ will return the result of the last iterator function called. Iteration
+ stops immediately if any call to the iteration function results in a
+ non-zero return.
2. Find an object in an associative array::
@@ -310,14 +310,14 @@ return.
const struct assoc_array_ops *ops,
const void *index_key);
-This walks through the array's internal tree directly to the object
-specified by the index key..
+ This walks through the array's internal tree directly to the object
+ specified by the index key.
-This may be used on an array at the same time as the array is being
-modified, provided the RCU read lock is held.
+ This may be used on an array at the same time as the array is being
+ modified, provided the RCU read lock is held.
-The function will return the object if found (and set ``*_type`` to the object
-type) or will return ``NULL`` if the object was not found.
+ The function will return the object if found (and set ``*_type`` to the
+ object type) or will return ``NULL`` if the object was not found.
Index Key Form
@@ -399,10 +399,11 @@ fixed levels. For example::
In the above example, there are 7 nodes (A-G), each with 16 slots (0-f).
Assuming no other meta data nodes in the tree, the key space is divided
-thusly::
+thusly:
+ =========== ====
KEY PREFIX NODE
- ========== ====
+ =========== ====
137* D
138* E
13[0-69-f]* C
@@ -410,10 +411,12 @@ thusly::
e6* G
e[0-57-f]* F
[02-df]* A
+ =========== ====
So, for instance, keys with the following example index keys will be found in
-the appropriate nodes::
+the appropriate nodes:
+ =============== ======= ====
INDEX KEY PREFIX NODE
=============== ======= ====
13694892892489 13 C
@@ -422,12 +425,13 @@ the appropriate nodes::
138bbb89003093 138 E
1394879524789 12 C
1458952489 1 B
- 9431809de993ba - A
- b4542910809cd - A
+ 9431809de993ba \- A
+ b4542910809cd \- A
e5284310def98 e F
e68428974237 e6 G
e7fffcbd443 e F
- f3842239082 - A
+ f3842239082 \- A
+ =============== ======= ====
To save memory, if a node can hold all the leaves in its portion of keyspace,
then the node will have all those leaves in it and will not have any metadata
@@ -441,8 +445,9 @@ metadata pointer. If the metadata pointer is there, any leaf whose key matches
the metadata key prefix must be in the subtree that the metadata pointer points
to.
-In the above example list of index keys, node A will contain::
+In the above example list of index keys, node A will contain:
+ ==== =============== ==================
SLOT CONTENT INDEX KEY (PREFIX)
==== =============== ==================
1 PTR TO NODE B 1*
@@ -450,11 +455,16 @@ In the above example list of index keys, node A will contain::
any LEAF b4542910809cd
e PTR TO NODE F e*
any LEAF f3842239082
+ ==== =============== ==================
-and node B::
+and node B:
- 3 PTR TO NODE C 13*
- any LEAF 1458952489
+ ==== =============== ==================
+ SLOT CONTENT INDEX KEY (PREFIX)
+ ==== =============== ==================
+ 3 PTR TO NODE C 13*
+ any LEAF 1458952489
+ ==== =============== ==================
Shortcuts
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 7f2f11b48286..c0b1b6089307 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -547,11 +547,13 @@ Time and date
%pt[RT]s YYYY-mm-dd HH:MM:SS
%pt[RT]d YYYY-mm-dd
%pt[RT]t HH:MM:SS
- %pt[RT][dt][r][s]
+ %ptSp <seconds>.<nanoseconds>
+ %pt[RST][dt][r][s]
For printing date and time as represented by::
- R struct rtc_time structure
+ R content of struct rtc_time
+ S content of struct timespec64
T time64_t type
in human readable format.
@@ -563,6 +565,11 @@ The %pt[RT]s (space) will override ISO 8601 separator by using ' ' (space)
instead of 'T' (Capital T) between date and time. It won't have any effect
when date or time is omitted.
+The %ptSp is equivalent to %lld.%09ld for the content of the struct timespec64.
+When the other specifiers are given, it becomes the respective equivalent of
+%ptT[dt][r][s].%09ld. In other words, the seconds are being printed in
+the human readable format followed by a dot and nanoseconds.
+
Passed by reference.
struct clk
diff --git a/Documentation/crypto/index.rst b/Documentation/crypto/index.rst
index 100b47d049c0..4ee667c446f9 100644
--- a/Documentation/crypto/index.rst
+++ b/Documentation/crypto/index.rst
@@ -27,3 +27,4 @@ for cryptographic use cases, as well as programming examples.
descore-readme
device_drivers/index
krb5
+ sha3
diff --git a/Documentation/crypto/sha3.rst b/Documentation/crypto/sha3.rst
new file mode 100644
index 000000000000..37640f295118
--- /dev/null
+++ b/Documentation/crypto/sha3.rst
@@ -0,0 +1,130 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+==========================
+SHA-3 Algorithm Collection
+==========================
+
+.. contents::
+
+Overview
+========
+
+The SHA-3 family of algorithms, as specified in NIST FIPS-202 [1]_, contains six
+algorithms based on the Keccak sponge function. The differences between them
+are: the "rate" (how much of the state buffer gets updated with new data between
+invocations of the Keccak function and analogous to the "block size"), what
+domain separation suffix gets appended to the input data, and how much output
+data is extracted at the end. The Keccak sponge function is designed such that
+arbitrary amounts of output can be obtained for certain algorithms.
+
+Four digest algorithms are provided:
+
+ - SHA3-224
+ - SHA3-256
+ - SHA3-384
+ - SHA3-512
+
+Additionally, two Extendable-Output Functions (XOFs) are provided:
+
+ - SHAKE128
+ - SHAKE256
+
+The SHA-3 library API supports all six of these algorithms. The four digest
+algorithms are also supported by the crypto_shash and crypto_ahash APIs.
+
+This document describes the SHA-3 library API.
+
+
+Digests
+=======
+
+The following functions compute SHA-3 digests::
+
+ void sha3_224(const u8 *in, size_t in_len, u8 out[SHA3_224_DIGEST_SIZE]);
+ void sha3_256(const u8 *in, size_t in_len, u8 out[SHA3_256_DIGEST_SIZE]);
+ void sha3_384(const u8 *in, size_t in_len, u8 out[SHA3_384_DIGEST_SIZE]);
+ void sha3_512(const u8 *in, size_t in_len, u8 out[SHA3_512_DIGEST_SIZE]);
+
+For users that need to pass in data incrementally, an incremental API is also
+provided. The incremental API uses the following struct::
+
+ struct sha3_ctx { ... };
+
+Initialization is done with one of::
+
+ void sha3_224_init(struct sha3_ctx *ctx);
+ void sha3_256_init(struct sha3_ctx *ctx);
+ void sha3_384_init(struct sha3_ctx *ctx);
+ void sha3_512_init(struct sha3_ctx *ctx);
+
+Input data is then added with any number of calls to::
+
+ void sha3_update(struct sha3_ctx *ctx, const u8 *in, size_t in_len);
+
+Finally, the digest is generated using::
+
+ void sha3_final(struct sha3_ctx *ctx, u8 *out);
+
+which also zeroizes the context. The length of the digest is determined by the
+initialization function that was called.
+
+
+Extendable-Output Functions
+===========================
+
+The following functions compute the SHA-3 extendable-output functions (XOFs)::
+
+ void shake128(const u8 *in, size_t in_len, u8 *out, size_t out_len);
+ void shake256(const u8 *in, size_t in_len, u8 *out, size_t out_len);
+
+For users that need to provide the input data incrementally and/or receive the
+output data incrementally, an incremental API is also provided. The incremental
+API uses the following struct::
+
+ struct shake_ctx { ... };
+
+Initialization is done with one of::
+
+ void shake128_init(struct shake_ctx *ctx);
+ void shake256_init(struct shake_ctx *ctx);
+
+Input data is then added with any number of calls to::
+
+ void shake_update(struct shake_ctx *ctx, const u8 *in, size_t in_len);
+
+Finally, the output data is extracted with any number of calls to::
+
+ void shake_squeeze(struct shake_ctx *ctx, u8 *out, size_t out_len);
+
+and telling it how much data should be extracted. Note that performing multiple
+squeezes, with the output laid consecutively in a buffer, gets exactly the same
+output as doing a single squeeze for the combined amount over the same buffer.
+
+More input data cannot be added after squeezing has started.
+
+Once all the desired output has been extracted, zeroize the context::
+
+ void shake_zeroize_ctx(struct shake_ctx *ctx);
+
+
+Testing
+=======
+
+To test the SHA-3 code, use sha3_kunit (CONFIG_CRYPTO_LIB_SHA3_KUNIT_TEST).
+
+Since the SHA-3 algorithms are FIPS-approved, when the kernel is booted in FIPS
+mode the SHA-3 library also performs a simple self-test. This is purely to meet
+a FIPS requirement. Normal testing done by kernel developers and integrators
+should use the much more comprehensive KUnit test suite instead.
+
+
+References
+==========
+
+.. [1] https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
+
+
+API Function Reference
+======================
+
+.. kernel-doc:: include/crypto/sha3.h
diff --git a/Documentation/crypto/userspace-if.rst b/Documentation/crypto/userspace-if.rst
index f80f243e227e..8158b363cd98 100644
--- a/Documentation/crypto/userspace-if.rst
+++ b/Documentation/crypto/userspace-if.rst
@@ -302,10 +302,9 @@ follows:
Depending on the RNG type, the RNG must be seeded. The seed is provided
-using the setsockopt interface to set the key. For example, the
-ansi_cprng requires a seed. The DRBGs do not require a seed, but may be
-seeded. The seed is also known as a *Personalization String* in NIST SP 800-90A
-standard.
+using the setsockopt interface to set the key. The SP800-90A DRBGs do
+not require a seed, but may be seeded. The seed is also known as a
+*Personalization String* in NIST SP 800-90A standard.
Using the read()/recvmsg() system calls, random numbers can be obtained.
The kernel generates at most 128 bytes in one call. If user space
diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst
index d5c47e560324..dfaad0a279ff 100644
--- a/Documentation/dev-tools/checkpatch.rst
+++ b/Documentation/dev-tools/checkpatch.rst
@@ -461,16 +461,9 @@ Comments
line comments is::
/*
- * This is the preferred style
- * for multi line comments.
- */
-
- The networking comment style is a bit different, with the first line
- not empty like the former::
-
- /* This is the preferred comment style
- * for files in net/ and drivers/net/
- */
+ * This is the preferred style
+ * for multi line comments.
+ */
See: https://www.kernel.org/doc/html/latest/process/coding-style.html#commenting
diff --git a/Documentation/dev-tools/kunit/run_manual.rst b/Documentation/dev-tools/kunit/run_manual.rst
index 699d92885075..98e8d5b28808 100644
--- a/Documentation/dev-tools/kunit/run_manual.rst
+++ b/Documentation/dev-tools/kunit/run_manual.rst
@@ -35,6 +35,12 @@ or be built into the kernel.
a good way of quickly testing everything applicable to the current
config.
+ KUnit can be enabled or disabled at boot time, and this behavior is
+ controlled by the kunit.enable kernel parameter.
+ By default, kunit.enable is set to 1 because KUNIT_DEFAULT_ENABLED is
+ enabled by default. To ensure that tests are executed as expected,
+ verify that kunit.enable=1 at boot time.
+
Once we have built our kernel (and/or modules), it is simple to run
the tests. If the tests are built-in, they will run automatically on the
kernel boot. The results will be written to the kernel log (``dmesg``)
diff --git a/Documentation/devicetree/bindings/crypto/amd,ccp-seattle-v1a.yaml b/Documentation/devicetree/bindings/crypto/amd,ccp-seattle-v1a.yaml
index 32bf3a1c3b42..5fb708471059 100644
--- a/Documentation/devicetree/bindings/crypto/amd,ccp-seattle-v1a.yaml
+++ b/Documentation/devicetree/bindings/crypto/amd,ccp-seattle-v1a.yaml
@@ -21,6 +21,9 @@ properties:
dma-coherent: true
+ iommus:
+ maxItems: 4
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml b/Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml
index 08fe6a707a37..c3408dcf5d20 100644
--- a/Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml
+++ b/Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml
@@ -13,6 +13,7 @@ properties:
compatible:
items:
- enum:
+ - qcom,kaanapali-inline-crypto-engine
- qcom,qcs8300-inline-crypto-engine
- qcom,sa8775p-inline-crypto-engine
- qcom,sc7180-inline-crypto-engine
diff --git a/Documentation/devicetree/bindings/crypto/qcom,prng.yaml b/Documentation/devicetree/bindings/crypto/qcom,prng.yaml
index ed7e16bd11d3..597441d94cf1 100644
--- a/Documentation/devicetree/bindings/crypto/qcom,prng.yaml
+++ b/Documentation/devicetree/bindings/crypto/qcom,prng.yaml
@@ -20,6 +20,7 @@ properties:
- qcom,ipq5332-trng
- qcom,ipq5424-trng
- qcom,ipq9574-trng
+ - qcom,kaanapali-trng
- qcom,qcs615-trng
- qcom,qcs8300-trng
- qcom,sa8255p-trng
diff --git a/Documentation/devicetree/bindings/crypto/qcom-qce.yaml b/Documentation/devicetree/bindings/crypto/qcom-qce.yaml
index e009cb712fb8..79d5be2548bc 100644
--- a/Documentation/devicetree/bindings/crypto/qcom-qce.yaml
+++ b/Documentation/devicetree/bindings/crypto/qcom-qce.yaml
@@ -45,6 +45,7 @@ properties:
- items:
- enum:
+ - qcom,kaanapali-qce
- qcom,qcs615-qce
- qcom,qcs8300-qce
- qcom,sa8775p-qce
diff --git a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml
index 3d60d9e9e208..d0fad930de9d 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml
@@ -39,6 +39,9 @@ properties:
- amlogic,a4-gpio-ao-intc
- amlogic,a5-gpio-intc
- amlogic,c3-gpio-intc
+ - amlogic,s6-gpio-intc
+ - amlogic,s7-gpio-intc
+ - amlogic,s7d-gpio-intc
- amlogic,t7-gpio-intc
- const: amlogic,meson-gpio-intc
diff --git a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml
index 55636d06a674..999df5b905c5 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml
@@ -25,13 +25,14 @@ properties:
interrupt-controller: true
'#interrupt-cells':
- const: 2
+ const: 1
description:
The first cell is the IRQ number, the second cell is the trigger
type as defined in interrupt.txt in this directory.
interrupts:
- maxItems: 6
+ minItems: 1
+ maxItems: 10
description: |
Depend to which INTC0 or INTC1 used.
INTC0 and INTC1 are two kinds of interrupt controller with enable and raw
@@ -74,13 +75,17 @@ examples:
interrupt-controller@12101b00 {
compatible = "aspeed,ast2700-intc-ic";
reg = <0 0x12101b00 0 0x10>;
- #interrupt-cells = <2>;
+ #interrupt-cells = <1>;
interrupt-controller;
interrupts = <GIC_SPI 192 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 194 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 195 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 196 IRQ_TYPE_LEVEL_HIGH>,
- <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>;
+ <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 199 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 200 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 201 IRQ_TYPE_LEVEL_HIGH>;
};
};
diff --git a/Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml b/Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml
index f683d696909b..6fdb7ae9e85a 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml
@@ -58,6 +58,7 @@ properties:
- const: andestech,nceplic100
- items:
- enum:
+ - anlogic,dr1v90-plic
- canaan,k210-plic
- eswin,eic7700-plic
- sifive,fu540-c000-plic
@@ -76,6 +77,9 @@ properties:
- thead,th1520-plic
- const: thead,c900-plic
- items:
+ - const: ultrarisc,dp1000-plic
+ - const: ultrarisc,cp100-plic
+ - items:
- const: sifive,plic-1.0.0
- const: riscv,plic0
deprecated: true
diff --git a/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml b/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml
index d6fb08a54167..62fd220e126e 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml
@@ -4,18 +4,23 @@
$id: http://devicetree.org/schemas/interrupt-controller/thead,c900-aclint-mswi.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Sophgo sg2042 CLINT Machine-level Software Interrupt Device
+title: ACLINT Machine-level Software Interrupt Device
maintainers:
- Inochi Amaoto <inochiama@outlook.com>
properties:
compatible:
- items:
- - enum:
- - sophgo,sg2042-aclint-mswi
- - sophgo,sg2044-aclint-mswi
- - const: thead,c900-aclint-mswi
+ oneOf:
+ - items:
+ - enum:
+ - sophgo,sg2042-aclint-mswi
+ - sophgo,sg2044-aclint-mswi
+ - const: thead,c900-aclint-mswi
+ - items:
+ - enum:
+ - anlogic,dr1v90-aclint-mswi
+ - const: nuclei,ux900-aclint-mswi
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml b/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml
index c1ab865fcd64..d02c6886283a 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml
+++ b/Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml
@@ -30,6 +30,10 @@ properties:
- const: thead,c900-aclint-sswi
- items:
- const: mips,p8700-aclint-sswi
+ - items:
+ - enum:
+ - anlogic,dr1v90-aclint-sswi
+ - const: nuclei,ux900-aclint-sswi
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
index 6d22131ac2f9..fbe2ddcdd909 100644
--- a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
+++ b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
@@ -17,6 +17,7 @@ properties:
compatible:
enum:
- airoha,en7581-eth
+ - airoha,an7583-eth
reg:
items:
@@ -44,6 +45,7 @@ properties:
- description: PDMA irq
resets:
+ minItems: 7
maxItems: 8
reset-names:
@@ -54,8 +56,9 @@ properties:
- const: xsi-mac
- const: hsi0-mac
- const: hsi1-mac
- - const: hsi-mac
+ - enum: [ hsi-mac, xfp-mac ]
- const: xfp-mac
+ minItems: 7
memory-region:
items:
@@ -81,6 +84,36 @@ properties:
interface to implement hardware flow offloading programming Packet
Processor Engine (PPE) flow table.
+allOf:
+ - $ref: ethernet-controller.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - airoha,en7581-eth
+ then:
+ properties:
+ resets:
+ minItems: 8
+
+ reset-names:
+ minItems: 8
+
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - airoha,an7583-eth
+ then:
+ properties:
+ resets:
+ maxItems: 7
+
+ reset-names:
+ maxItems: 7
+
patternProperties:
"^ethernet@[1-4]$":
type: object
diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml
index c7644e6586d3..59c57f58116b 100644
--- a/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml
+++ b/Documentation/devicetree/bindings/net/airoha,en7581-npu.yaml
@@ -18,6 +18,7 @@ properties:
compatible:
enum:
- airoha,en7581-npu
+ - airoha,an7583-npu
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/net/amd,xgbe-seattle-v1a.yaml b/Documentation/devicetree/bindings/net/amd,xgbe-seattle-v1a.yaml
new file mode 100644
index 000000000000..006add8b6410
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/amd,xgbe-seattle-v1a.yaml
@@ -0,0 +1,147 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/amd,xgbe-seattle-v1a.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: AMD XGBE Seattle v1a
+
+maintainers:
+ - Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
+
+allOf:
+ - $ref: /schemas/net/ethernet-controller.yaml#
+
+properties:
+ compatible:
+ const: amd,xgbe-seattle-v1a
+
+ reg:
+ items:
+ - description: MAC registers
+ - description: PCS registers
+ - description: SerDes Rx/Tx registers
+ - description: SerDes integration registers (1/2)
+ - description: SerDes integration registers (2/2)
+
+ interrupts:
+ description: Device interrupts. The first entry is the general device
+ interrupt. If amd,per-channel-interrupt is specified, each DMA channel
+ interrupt must be specified. The last entry is the PCS auto-negotiation
+ interrupt.
+ minItems: 2
+ maxItems: 6
+
+ clocks:
+ items:
+ - description: DMA clock for the device
+ - description: PTP clock for the device
+
+ clock-names:
+ items:
+ - const: dma_clk
+ - const: ptp_clk
+
+ iommus:
+ maxItems: 1
+
+ phy-mode: true
+
+ dma-coherent: true
+
+ amd,per-channel-interrupt:
+ description: Indicates that Rx and Tx complete will generate a unique
+ interrupt for each DMA channel.
+ type: boolean
+
+ amd,speed-set:
+ description: >
+ Speed capabilities of the device.
+ 0 = 1GbE and 10GbE
+ 1 = 2.5GbE and 10GbE
+ $ref: /schemas/types.yaml#/definitions/uint32
+ enum: [0, 1]
+
+ amd,serdes-blwc:
+ description: Baseline wandering correction enablement for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ minItems: 3
+ maxItems: 3
+ items:
+ enum: [0, 1]
+
+ amd,serdes-cdr-rate:
+ description: CDR rate speed selection for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ items:
+ - description: CDR rate for 1GbE
+ - description: CDR rate for 2.5GbE
+ - description: CDR rate for 10GbE
+
+ amd,serdes-pq-skew:
+ description: PQ data sampling skew for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ items:
+ - description: PQ skew for 1GbE
+ - description: PQ skew for 2.5GbE
+ - description: PQ skew for 10GbE
+
+ amd,serdes-tx-amp:
+ description: TX amplitude boost for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ items:
+ - description: TX amplitude for 1GbE
+ - description: TX amplitude for 2.5GbE
+ - description: TX amplitude for 10GbE
+
+ amd,serdes-dfe-tap-config:
+ description: DFE taps available to run for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ items:
+ - description: DFE taps available for 1GbE
+ - description: DFE taps available for 2.5GbE
+ - description: DFE taps available for 10GbE
+
+ amd,serdes-dfe-tap-enable:
+ description: DFE taps to enable for each speed.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ items:
+ - description: DFE taps to enable for 1GbE
+ - description: DFE taps to enable for 2.5GbE
+ - description: DFE taps to enable for 10GbE
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - clock-names
+ - phy-mode
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ ethernet@e0700000 {
+ compatible = "amd,xgbe-seattle-v1a";
+ reg = <0xe0700000 0x80000>,
+ <0xe0780000 0x80000>,
+ <0xe1240800 0x00400>,
+ <0xe1250000 0x00060>,
+ <0xe1250080 0x00004>;
+ interrupts = <0 325 4>,
+ <0 326 1>, <0 327 1>, <0 328 1>, <0 329 1>,
+ <0 323 4>;
+ amd,per-channel-interrupt;
+ clocks = <&xgbe_dma_clk>, <&xgbe_ptp_clk>;
+ clock-names = "dma_clk", "ptp_clk";
+ phy-mode = "xgmii";
+ mac-address = [ 02 a1 a2 a3 a4 a5 ];
+ amd,speed-set = <0>;
+ amd,serdes-blwc = <1>, <1>, <0>;
+ amd,serdes-cdr-rate = <2>, <2>, <7>;
+ amd,serdes-pq-skew = <10>, <10>, <30>;
+ amd,serdes-tx-amp = <15>, <15>, <10>;
+ amd,serdes-dfe-tap-config = <3>, <3>, <1>;
+ amd,serdes-dfe-tap-enable = <0>, <0>, <127>;
+ };
diff --git a/Documentation/devicetree/bindings/net/amd-xgbe.txt b/Documentation/devicetree/bindings/net/amd-xgbe.txt
deleted file mode 100644
index 9c27dfcd1133..000000000000
--- a/Documentation/devicetree/bindings/net/amd-xgbe.txt
+++ /dev/null
@@ -1,76 +0,0 @@
-* AMD 10GbE driver (amd-xgbe)
-
-Required properties:
-- compatible: Should be "amd,xgbe-seattle-v1a"
-- reg: Address and length of the register sets for the device
- - MAC registers
- - PCS registers
- - SerDes Rx/Tx registers
- - SerDes integration registers (1/2)
- - SerDes integration registers (2/2)
-- interrupts: Should contain the amd-xgbe interrupt(s). The first interrupt
- listed is required and is the general device interrupt. If the optional
- amd,per-channel-interrupt property is specified, then one additional
- interrupt for each DMA channel supported by the device should be specified.
- The last interrupt listed should be the PCS auto-negotiation interrupt.
-- clocks:
- - DMA clock for the amd-xgbe device (used for calculating the
- correct Rx interrupt watchdog timer value on a DMA channel
- for coalescing)
- - PTP clock for the amd-xgbe device
-- clock-names: Should be the names of the clocks
- - "dma_clk" for the DMA clock
- - "ptp_clk" for the PTP clock
-- phy-mode: See ethernet.txt file in the same directory
-
-Optional properties:
-- dma-coherent: Present if dma operations are coherent
-- amd,per-channel-interrupt: Indicates that Rx and Tx complete will generate
- a unique interrupt for each DMA channel - this requires an additional
- interrupt be configured for each DMA channel
-- amd,speed-set: Speed capabilities of the device
- 0 - 1GbE and 10GbE (default)
- 1 - 2.5GbE and 10GbE
-
-The MAC address will be determined using the optional properties defined in
-ethernet.txt.
-
-The following optional properties are represented by an array with each
-value corresponding to a particular speed. The first array value represents
-the setting for the 1GbE speed, the second value for the 2.5GbE speed and
-the third value for the 10GbE speed. All three values are required if the
-property is used.
-- amd,serdes-blwc: Baseline wandering correction enablement
- 0 - Off
- 1 - On
-- amd,serdes-cdr-rate: CDR rate speed selection
-- amd,serdes-pq-skew: PQ (data sampling) skew
-- amd,serdes-tx-amp: TX amplitude boost
-- amd,serdes-dfe-tap-config: DFE taps available to run
-- amd,serdes-dfe-tap-enable: DFE taps to enable
-
-Example:
- xgbe@e0700000 {
- compatible = "amd,xgbe-seattle-v1a";
- reg = <0 0xe0700000 0 0x80000>,
- <0 0xe0780000 0 0x80000>,
- <0 0xe1240800 0 0x00400>,
- <0 0xe1250000 0 0x00060>,
- <0 0xe1250080 0 0x00004>;
- interrupt-parent = <&gic>;
- interrupts = <0 325 4>,
- <0 326 1>, <0 327 1>, <0 328 1>, <0 329 1>,
- <0 323 4>;
- amd,per-channel-interrupt;
- clocks = <&xgbe_dma_clk>, <&xgbe_ptp_clk>;
- clock-names = "dma_clk", "ptp_clk";
- phy-mode = "xgmii";
- mac-address = [ 02 a1 a2 a3 a4 a5 ];
- amd,speed-set = <0>;
- amd,serdes-blwc = <1>, <1>, <0>;
- amd,serdes-cdr-rate = <2>, <2>, <7>;
- amd,serdes-pq-skew = <10>, <10>, <30>;
- amd,serdes-tx-amp = <15>, <15>, <10>;
- amd,serdes-dfe-tap-config = <3>, <3>, <1>;
- amd,serdes-dfe-tap-enable = <0>, <0>, <127>;
- };
diff --git a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
index d6ef468495c5..a105dc07ed12 100644
--- a/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
+++ b/Documentation/devicetree/bindings/net/aspeed,ast2600-mdio.yaml
@@ -19,7 +19,12 @@ allOf:
properties:
compatible:
- const: aspeed,ast2600-mdio
+ oneOf:
+ - const: aspeed,ast2600-mdio
+ - items:
+ - enum:
+ - aspeed,ast2700-mdio
+ - const: aspeed,ast2600-mdio
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/net/bluetooth/marvell,sd8897-bt.yaml b/Documentation/devicetree/bindings/net/bluetooth/marvell,sd8897-bt.yaml
new file mode 100644
index 000000000000..a307c64cfa4d
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/bluetooth/marvell,sd8897-bt.yaml
@@ -0,0 +1,79 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/bluetooth/marvell,sd8897-bt.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO)
+
+maintainers:
+ - Ariel D'Alessandro <ariel.dalessandro@collabora.com>
+
+allOf:
+ - $ref: /schemas/net/bluetooth/bluetooth-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - marvell,sd8897-bt
+ - marvell,sd8997-bt
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ marvell,cal-data:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ description:
+ Calibration data downloaded to the device during initialization.
+ maxItems: 28
+
+ marvell,wakeup-pin:
+ $ref: /schemas/types.yaml#/definitions/uint16
+ description:
+ Wakeup pin number of the bluetooth chip. Used by firmware to wakeup host
+ system.
+
+ marvell,wakeup-gap-ms:
+ $ref: /schemas/types.yaml#/definitions/uint16
+ description:
+ Wakeup latency of the host platform. Required by the chip sleep feature.
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ mmc {
+ vmmc-supply = <&wlan_en_reg>;
+ bus-width = <4>;
+ cap-power-off-card;
+ keep-power-in-suspend;
+
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ bluetooth@2 {
+ compatible = "marvell,sd8897-bt";
+ reg = <2>;
+ interrupt-parent = <&pio>;
+ interrupts = <119 IRQ_TYPE_LEVEL_LOW>;
+
+ marvell,cal-data = /bits/ 8 <
+ 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02
+ 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00
+ 0x00 0x00 0xf0 0x00>;
+ marvell,wakeup-pin = /bits/ 16 <0x0d>;
+ marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
+ };
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt
index f546b1f7dd6d..a68022a57c51 100644
--- a/Documentation/devicetree/bindings/net/btusb.txt
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -14,7 +14,7 @@ Required properties:
Also, vendors that use btusb may have device additional properties, e.g:
-Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
+Documentation/devicetree/bindings/net/bluetooth/marvell,sd8897-bt.yaml
Optional properties:
diff --git a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml
index 61ef60d8f1c7..2c9d37975bed 100644
--- a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml
+++ b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml
@@ -109,6 +109,26 @@ properties:
maximum: 32
minItems: 1
+ pinctrl-0:
+ description: Default pinctrl state
+
+ pinctrl-1:
+ description: Can be "sleep" or "wakeup" pinctrl state
+
+ pinctrl-2:
+ description: Can be "sleep" or "wakeup" pinctrl state
+
+ pinctrl-names:
+ description:
+ When present should contain at least "default" describing the default pin
+ states. Other states are "sleep" which describes the pinstate when
+ sleeping and "wakeup" describing the pins if wakeup is enabled.
+ minItems: 1
+ items:
+ - const: default
+ - enum: [ sleep, wakeup ]
+ - const: wakeup
+
power-domains:
description:
Power domain provider node and an args specifier containing
@@ -125,6 +145,11 @@ properties:
minItems: 1
maxItems: 2
+ wakeup-source:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ description:
+ List of phandles to system idle states in which mcan can wakeup the system.
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml b/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml
index c155c9c6db39..2d13638ebc6a 100644
--- a/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml
+++ b/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml
@@ -49,6 +49,11 @@ properties:
Must be half or less of "clocks" frequency.
maximum: 20000000
+ gpio-controller: true
+
+ "#gpio-cells":
+ const: 2
+
required:
- compatible
- reg
diff --git a/Documentation/devicetree/bindings/net/can/microchip,mpfs-can.yaml b/Documentation/devicetree/bindings/net/can/microchip,mpfs-can.yaml
index 1219c5cb601f..519a11fbe972 100644
--- a/Documentation/devicetree/bindings/net/can/microchip,mpfs-can.yaml
+++ b/Documentation/devicetree/bindings/net/can/microchip,mpfs-can.yaml
@@ -32,11 +32,15 @@ properties:
- description: AHB peripheral clock
- description: CAN bus clock
+ resets:
+ maxItems: 1
+
required:
- compatible
- reg
- interrupts
- clocks
+ - resets
additionalProperties: false
@@ -46,6 +50,7 @@ examples:
compatible = "microchip,mpfs-can";
reg = <0x2010c000 0x1000>;
clocks = <&clkcfg 17>, <&clkcfg 37>;
+ resets = <&clkcfg 17>;
interrupt-parent = <&plic>;
interrupts = <56>;
};
diff --git a/Documentation/devicetree/bindings/net/cdns,macb.yaml b/Documentation/devicetree/bindings/net/cdns,macb.yaml
index 1029786a855c..cb14c35ba996 100644
--- a/Documentation/devicetree/bindings/net/cdns,macb.yaml
+++ b/Documentation/devicetree/bindings/net/cdns,macb.yaml
@@ -38,7 +38,10 @@ properties:
- cdns,sam9x60-macb # Microchip sam9x60 SoC
- microchip,mpfs-macb # Microchip PolarFire SoC
- const: cdns,macb # Generic
-
+ - items:
+ - const: microchip,pic64gx-macb # Microchip PIC64GX SoC
+ - const: microchip,mpfs-macb # Microchip PolarFire SoC
+ - const: cdns,macb # Generic
- items:
- enum:
- atmel,sama5d3-macb # 10/100Mbit IP on Atmel sama5d3 SoCs
@@ -47,18 +50,19 @@ properties:
- const: cdns,macb # Generic
- enum:
- - atmel,sama5d29-gem # GEM XL IP (10/100) on Atmel sama5d29 SoCs
- atmel,sama5d2-gem # GEM IP (10/100) on Atmel sama5d2 SoCs
+ - atmel,sama5d29-gem # GEM XL IP (10/100) on Atmel sama5d29 SoCs
- atmel,sama5d3-gem # Gigabit IP on Atmel sama5d3 SoCs
- atmel,sama5d4-gem # GEM IP (10/100) on Atmel sama5d4 SoCs
+ - cdns,emac # Generic
+ - cdns,gem # Generic
+ - cdns,macb # Generic
- cdns,np4-macb # NP4 SoC devices
- microchip,sama7g5-emac # Microchip SAMA7G5 ethernet interface
- microchip,sama7g5-gem # Microchip SAMA7G5 gigabit ethernet interface
+ - mobileye,eyeq5-gem # Mobileye EyeQ5 SoCs
- raspberrypi,rp1-gem # Raspberry Pi RP1 gigabit ethernet interface
- sifive,fu540-c000-gem # SiFive FU540-C000 SoC
- - cdns,emac # Generic
- - cdns,gem # Generic
- - cdns,macb # Generic
- items:
- enum:
@@ -183,6 +187,15 @@ allOf:
reg:
maxItems: 1
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: mobileye,eyeq5-gem
+ then:
+ required:
+ - phys
+
unevaluatedProperties: false
examples:
diff --git a/Documentation/devicetree/bindings/net/dsa/lantiq,gswip.yaml b/Documentation/devicetree/bindings/net/dsa/lantiq,gswip.yaml
index f3154b19af78..205b683849a5 100644
--- a/Documentation/devicetree/bindings/net/dsa/lantiq,gswip.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/lantiq,gswip.yaml
@@ -4,10 +4,14 @@
$id: http://devicetree.org/schemas/net/dsa/lantiq,gswip.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Lantiq GSWIP Ethernet switches
+title: Lantiq GSWIP and MaxLinear GSW1xx Ethernet switches
-allOf:
- - $ref: dsa.yaml#/$defs/ethernet-ports
+description:
+ Lantiq GSWIP and MaxLinear GSW1xx switches share the same hardware IP.
+ Lantiq switches are embedded in SoCs and accessed via memory-mapped I/O,
+ while MaxLinear switches are standalone ICs connected via MDIO.
+
+$ref: dsa.yaml#
maintainers:
- Hauke Mehrtens <hauke@hauke-m.de>
@@ -18,9 +22,14 @@ properties:
- lantiq,xrx200-gswip
- lantiq,xrx300-gswip
- lantiq,xrx330-gswip
+ - maxlinear,gsw120
+ - maxlinear,gsw125
+ - maxlinear,gsw140
+ - maxlinear,gsw141
+ - maxlinear,gsw145
reg:
- minItems: 3
+ minItems: 1
maxItems: 3
reg-names:
@@ -37,9 +46,6 @@ properties:
compatible:
const: lantiq,xrx200-mdio
- required:
- - compatible
-
gphy-fw:
type: object
properties:
@@ -91,10 +97,63 @@ properties:
additionalProperties: false
+patternProperties:
+ "^(ethernet-)?ports$":
+ type: object
+ patternProperties:
+ "^(ethernet-)?port@[0-6]$":
+ $ref: dsa-port.yaml#
+ unevaluatedProperties: false
+
+ properties:
+ maxlinear,rmii-refclk-out:
+ type: boolean
+ description:
+ Configure the RMII reference clock to be a clock output
+ rather than an input. Only applicable for RMII mode.
+ tx-internal-delay-ps:
+ enum: [0, 500, 1000, 1500, 2000, 2500, 3000, 3500]
+ description:
+ RGMII TX Clock Delay defined in pico seconds.
+ The delay lines adjust the MII clock vs. data timing.
+ If this property is not present the delay is determined by
+ the interface mode.
+ rx-internal-delay-ps:
+ enum: [0, 500, 1000, 1500, 2000, 2500, 3000, 3500]
+ description:
+ RGMII RX Clock Delay defined in pico seconds.
+ The delay lines adjust the MII clock vs. data timing.
+ If this property is not present the delay is determined by
+ the interface mode.
+
required:
- compatible
- reg
+allOf:
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - lantiq,xrx200-gswip
+ - lantiq,xrx300-gswip
+ - lantiq,xrx330-gswip
+ then:
+ properties:
+ reg:
+ minItems: 3
+ maxItems: 3
+ mdio:
+ required:
+ - compatible
+ else:
+ properties:
+ reg:
+ maxItems: 1
+ reg-names: false
+ gphy-fw: false
+
unevaluatedProperties: false
examples:
@@ -113,8 +172,10 @@ examples:
port@0 {
reg = <0>;
label = "lan3";
- phy-mode = "rgmii";
+ phy-mode = "rgmii-id";
phy-handle = <&phy0>;
+ tx-internal-delay-ps = <2000>;
+ rx-internal-delay-ps = <2000>;
};
port@1 {
@@ -200,3 +261,90 @@ examples:
};
};
};
+
+ - |
+ #include <dt-bindings/leds/common.h>
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch@1f {
+ compatible = "maxlinear,gsw125";
+ reg = <0x1f>;
+
+ ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port@0 {
+ reg = <0>;
+ label = "lan0";
+ phy-handle = <&switchphy0>;
+ phy-mode = "internal";
+ };
+
+ port@1 {
+ reg = <1>;
+ label = "lan1";
+ phy-handle = <&switchphy1>;
+ phy-mode = "internal";
+ };
+
+ port@4 {
+ reg = <4>;
+ label = "wan";
+ phy-mode = "1000base-x";
+ managed = "in-band-status";
+ };
+
+ port@5 {
+ reg = <5>;
+ phy-mode = "rgmii-id";
+ tx-internal-delay-ps = <2000>;
+ rx-internal-delay-ps = <2000>;
+ ethernet = <&eth0>;
+
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+ };
+ };
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switchphy0: switchphy@0 {
+ reg = <0>;
+
+ leds {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ led@0 {
+ reg = <0>;
+ color = <LED_COLOR_ID_GREEN>;
+ function = LED_FUNCTION_LAN;
+ };
+ };
+ };
+
+ switchphy1: switchphy@1 {
+ reg = <1>;
+
+ leds {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ led@0 {
+ reg = <0>;
+ color = <LED_COLOR_ID_GREEN>;
+ function = LED_FUNCTION_LAN;
+ };
+ };
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/dsa/motorcomm,yt921x.yaml b/Documentation/devicetree/bindings/net/dsa/motorcomm,yt921x.yaml
new file mode 100644
index 000000000000..33a6552e46fc
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/motorcomm,yt921x.yaml
@@ -0,0 +1,167 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/dsa/motorcomm,yt921x.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Motorcomm YT921x Ethernet switch family
+
+maintainers:
+ - David Yang <mmyangfl@gmail.com>
+
+description: |
+ The Motorcomm YT921x series is a family of Ethernet switches with up to 8
+ internal GbE PHYs and up to 2 GMACs, including:
+
+ - YT9215S / YT9215RB / YT9215SC: 5 GbE PHYs (Port 0-4) + 2 GMACs (Port 8-9)
+ - YT9213NB: 2 GbE PHYs (Port 1/3) + 1 GMAC (Port 9)
+ - YT9214NB: 2 GbE PHYs (Port 1/3) + 2 GMACs (Port 8-9)
+ - YT9218N: 8 GbE PHYs (Port 0-7)
+ - YT9218MB: 8 GbE PHYs (Port 0-7) + 2 GMACs (Port 8-9)
+
+ Any port can be used as the CPU port.
+
+properties:
+ compatible:
+ const: motorcomm,yt9215
+
+ reg:
+ enum: [0x0, 0x1d]
+
+ reset-gpios:
+ maxItems: 1
+
+ mdio:
+ $ref: /schemas/net/mdio.yaml#
+ unevaluatedProperties: false
+ description:
+ Internal MDIO bus for the internal GbE PHYs. PHY 0-7 are used for Port
+ 0-7 respectively.
+
+ mdio-external:
+ $ref: /schemas/net/mdio.yaml#
+ unevaluatedProperties: false
+ description:
+ External MDIO bus to access external components. External PHYs for GMACs
+ (Port 8-9) are expected to be connected to the external MDIO bus in
+ vendor's reference design, but that is not a hard limitation from the
+ chip.
+
+required:
+ - compatible
+ - reg
+
+allOf:
+ - $ref: dsa.yaml#/$defs/ethernet-ports
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ switch@1d {
+ compatible = "motorcomm,yt9215";
+ /* default 0x1d, alternate 0x0 */
+ reg = <0x1d>;
+ reset-gpios = <&tlmm 39 GPIO_ACTIVE_LOW>;
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ sw_phy0: phy@0 {
+ reg = <0x0>;
+ };
+
+ sw_phy1: phy@1 {
+ reg = <0x1>;
+ };
+
+ sw_phy2: phy@2 {
+ reg = <0x2>;
+ };
+
+ sw_phy3: phy@3 {
+ reg = <0x3>;
+ };
+
+ sw_phy4: phy@4 {
+ reg = <0x4>;
+ };
+ };
+
+ mdio-external {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ phy1: phy@b {
+ reg = <0xb>;
+ };
+ };
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet-port@0 {
+ reg = <0>;
+ label = "lan1";
+ phy-mode = "internal";
+ phy-handle = <&sw_phy0>;
+ };
+
+ ethernet-port@1 {
+ reg = <1>;
+ label = "lan2";
+ phy-mode = "internal";
+ phy-handle = <&sw_phy1>;
+ };
+
+ ethernet-port@2 {
+ reg = <2>;
+ label = "lan3";
+ phy-mode = "internal";
+ phy-handle = <&sw_phy2>;
+ };
+
+ ethernet-port@3 {
+ reg = <3>;
+ label = "lan4";
+ phy-mode = "internal";
+ phy-handle = <&sw_phy3>;
+ };
+
+ ethernet-port@4 {
+ reg = <4>;
+ label = "lan5";
+ phy-mode = "internal";
+ phy-handle = <&sw_phy4>;
+ };
+
+ /* CPU port */
+ ethernet-port@8 {
+ reg = <8>;
+ phy-mode = "2500base-x";
+ ethernet = <&eth0>;
+
+ fixed-link {
+ speed = <2500>;
+ full-duplex;
+ };
+ };
+
+ /* if external phy is connected to a MAC */
+ ethernet-port@9 {
+ reg = <9>;
+ label = "wan";
+ phy-mode = "rgmii-id";
+ phy-handle = <&phy1>;
+ };
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
index e9dd914b0734..607b7fe8d28e 100644
--- a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
@@ -41,6 +41,9 @@ properties:
therefore discouraged.
maxItems: 1
+ clocks:
+ maxItems: 1
+
spi-cpha: true
spi-cpol: true
diff --git a/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml b/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml
new file mode 100644
index 000000000000..91e8cd1db67b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/eswin,eic7700-eth.yaml
@@ -0,0 +1,129 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/eswin,eic7700-eth.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Eswin EIC7700 SOC Eth Controller
+
+maintainers:
+ - Shuang Liang <liangshuang@eswincomputing.com>
+ - Zhi Li <lizhi2@eswincomputing.com>
+ - Shangjuan Wei <weishangjuan@eswincomputing.com>
+
+description:
+ Platform glue layer implementation for STMMAC Ethernet driver.
+
+select:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - eswin,eic7700-qos-eth
+ required:
+ - compatible
+
+allOf:
+ - $ref: snps,dwmac.yaml#
+
+properties:
+ compatible:
+ items:
+ - const: eswin,eic7700-qos-eth
+ - const: snps,dwmac-5.20
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ interrupt-names:
+ const: macirq
+
+ clocks:
+ items:
+ - description: AXI clock
+ - description: Configuration clock
+ - description: GMAC main clock
+ - description: Tx clock
+
+ clock-names:
+ items:
+ - const: axi
+ - const: cfg
+ - const: stmmaceth
+ - const: tx
+
+ resets:
+ maxItems: 1
+
+ reset-names:
+ items:
+ - const: stmmaceth
+
+ rx-internal-delay-ps:
+ enum: [0, 200, 600, 1200, 1600, 1800, 2000, 2200, 2400]
+
+ tx-internal-delay-ps:
+ enum: [0, 200, 600, 1200, 1600, 1800, 2000, 2200, 2400]
+
+ eswin,hsp-sp-csr:
+ description:
+ HSP CSR is to control and get status of different high-speed peripherals
+ (such as Ethernet, USB, SATA, etc.) via register, which can tune
+ board-level's parameters of PHY, etc.
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ items:
+ - items:
+ - description: Phandle to HSP(High-Speed Peripheral) device
+ - description: Offset of phy control register for internal
+ or external clock selection
+ - description: Offset of AXI clock controller Low-Power request
+ register
+ - description: Offset of register controlling TX/RX clock delay
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - clock-names
+ - interrupts
+ - interrupt-names
+ - phy-mode
+ - resets
+ - reset-names
+ - rx-internal-delay-ps
+ - tx-internal-delay-ps
+ - eswin,hsp-sp-csr
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ ethernet@50400000 {
+ compatible = "eswin,eic7700-qos-eth", "snps,dwmac-5.20";
+ reg = <0x50400000 0x10000>;
+ clocks = <&d0_clock 186>, <&d0_clock 171>, <&d0_clock 40>,
+ <&d0_clock 193>;
+ clock-names = "axi", "cfg", "stmmaceth", "tx";
+ interrupt-parent = <&plic>;
+ interrupts = <61>;
+ interrupt-names = "macirq";
+ phy-mode = "rgmii-id";
+ phy-handle = <&phy0>;
+ resets = <&reset 95>;
+ reset-names = "stmmaceth";
+ rx-internal-delay-ps = <200>;
+ tx-internal-delay-ps = <200>;
+ eswin,hsp-sp-csr = <&hsp_sp_csr 0x100 0x108 0x118>;
+ snps,axi-config = <&stmmac_axi_setup>;
+ snps,aal;
+ snps,fixed-burst;
+ snps,tso;
+ stmmac_axi_setup: stmmac-axi-config {
+ snps,blen = <0 0 0 0 16 8 4>;
+ snps,rd_osr_lmt = <2>;
+ snps,wr_osr_lmt = <2>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
index 2ec2d9fda7e3..bb4c49fc5fd8 100644
--- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
@@ -35,9 +35,13 @@ properties:
description: PHYs that implement IEEE802.3 clause 45
- pattern: "^ethernet-phy-id[a-f0-9]{4}\\.[a-f0-9]{4}$"
description:
- If the PHY reports an incorrect ID (or none at all) then the
- compatible list may contain an entry with the correct PHY ID
- in the above form.
+ PHYs contain identification registers. These will be read to
+ identify the PHY. If the PHY reports an incorrect ID, or the
+ PHY requires a specific initialization sequence (like a
+ particular order of clocks, resets, power supplies), in
+ order to be able to read the ID registers, then the
+ compatible list must contain an entry with the correct PHY
+ ID in the above form.
The first group of digits is the 16 bit Phy Identifier 1
register, this is the chip vendor OUI bits 3:18. The
second group of digits is the Phy Identifier 2 register,
diff --git a/Documentation/devicetree/bindings/net/fsl,enetc.yaml b/Documentation/devicetree/bindings/net/fsl,enetc.yaml
index ca70f0050171..aac20ab72ace 100644
--- a/Documentation/devicetree/bindings/net/fsl,enetc.yaml
+++ b/Documentation/devicetree/bindings/net/fsl,enetc.yaml
@@ -27,6 +27,7 @@ properties:
- const: fsl,enetc
- enum:
- pci1131,e101
+ - pci1131,e110
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
deleted file mode 100644
index 957e5e5c2927..000000000000
--- a/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO or USB based)
-------
-The 8997 devices supports multiple interfaces. When used on SDIO interfaces,
-the btmrvl driver is used and when used on USB interface, the btusb driver is
-used.
-
-Required properties:
-
- - compatible : should be one of the following:
- * "marvell,sd8897-bt" (for SDIO)
- * "marvell,sd8997-bt" (for SDIO)
- * "usb1286,204e" (for USB)
-
-Optional properties:
-
- - marvell,cal-data: Calibration data downloaded to the device during
- initialization. This is an array of 28 values(u8).
- This is only applicable to SDIO devices.
-
- - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip.
- firmware will use the pin to wakeup host system (u16).
- - marvell,wakeup-gap-ms: wakeup gap represents wakeup latency of the host
- platform. The value will be configured to firmware. This
- is needed to work chip's sleep feature as expected (u16).
- - interrupt-names: Used only for USB based devices (See below)
- - interrupts : specifies the interrupt pin number to the cpu. For SDIO, the
- driver will use the first interrupt specified in the interrupt
- array. For USB based devices, the driver will use the interrupt
- named "wakeup" from the interrupt-names and interrupt arrays.
- The driver will request an irq based on this interrupt number.
- During system suspend, the irq will be enabled so that the
- bluetooth chip can wakeup host platform under certain
- conditions. During system resume, the irq will be disabled
- to make sure unnecessary interrupt is not received.
-
-Example:
-
-IRQ pin 119 is used as system wakeup source interrupt.
-wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host
-using this device side pin and wakeup latency.
-
-Example for SDIO device follows (calibration data is also available in
-below example).
-
-&mmc3 {
- vmmc-supply = <&wlan_en_reg>;
- bus-width = <4>;
- cap-power-off-card;
- keep-power-in-suspend;
-
- #address-cells = <1>;
- #size-cells = <0>;
- btmrvl: bluetooth@2 {
- compatible = "marvell,sd8897-bt";
- reg = <2>;
- interrupt-parent = <&pio>;
- interrupts = <119 IRQ_TYPE_LEVEL_LOW>;
-
- marvell,cal-data = /bits/ 8 <
- 0x37 0x01 0x1c 0x00 0xff 0xff 0xff 0xff 0x01 0x7f 0x04 0x02
- 0x00 0x00 0xba 0xce 0xc0 0xc6 0x2d 0x00 0x00 0x00 0x00 0x00
- 0x00 0x00 0xf0 0x00>;
- marvell,wakeup-pin = /bits/ 16 <0x0d>;
- marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
- };
-};
-
-Example for USB device:
-
-&usb_host1_ohci {
- #address-cells = <1>;
- #size-cells = <0>;
-
- mvl_bt1: bt@1 {
- compatible = "usb1286,204e";
- reg = <1>;
- interrupt-parent = <&gpio0>;
- interrupt-names = "wakeup";
- interrupts = <119 IRQ_TYPE_LEVEL_LOW>;
- marvell,wakeup-pin = /bits/ 16 <0x0d>;
- marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
- };
-};
diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml
index b45f67f92e80..cc346946291a 100644
--- a/Documentation/devicetree/bindings/net/mediatek,net.yaml
+++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml
@@ -112,7 +112,7 @@ properties:
mediatek,wed:
$ref: /schemas/types.yaml#/definitions/phandle-array
- minItems: 2
+ minItems: 1
maxItems: 2
items:
maxItems: 1
@@ -249,6 +249,9 @@ allOf:
minItems: 1
maxItems: 1
+ mediatek,wed:
+ minItems: 2
+
mediatek,wed-pcie: false
else:
properties:
@@ -338,12 +341,13 @@ allOf:
- const: netsys0
- const: netsys1
- mediatek,infracfg: false
-
mediatek,sgmiisys:
minItems: 2
maxItems: 2
+ mediatek,wed:
+ maxItems: 1
+
- if:
properties:
compatible:
@@ -385,6 +389,9 @@ allOf:
minItems: 2
maxItems: 2
+ mediatek,wed:
+ minItems: 2
+
- if:
properties:
compatible:
@@ -429,6 +436,19 @@ allOf:
- const: xgp2
- const: xgp3
+ mediatek,wed:
+ minItems: 2
+
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: ralink,rt5350-eth
+ then:
+ properties:
+ mediatek,wed:
+ minItems: 2
+
patternProperties:
"^mac@[0-2]$":
type: object
diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
deleted file mode 100644
index 0a3647fe331b..000000000000
--- a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-* Microsemi - vsc8531 Giga bit ethernet phy
-
-Optional properties:
-- vsc8531,vddmac : The vddmac in mV. Allowed values is listed
- in the first row of Table 1 (below).
- This property is only used in combination
- with the 'edge-slowdown' property.
- Default value is 3300.
-- vsc8531,edge-slowdown : % the edge should be slowed down relative to
- the fastest possible edge time.
- Edge rate sets the drive strength of the MAC
- interface output signals. Changing the
- drive strength will affect the edge rate of
- the output signal. The goal of this setting
- is to help reduce electrical emission (EMI)
- by being able to reprogram drive strength
- and in effect slow down the edge rate if
- desired.
- To adjust the edge-slowdown, the 'vddmac'
- must be specified. Table 1 lists the
- supported edge-slowdown values for a given
- 'vddmac'.
- Default value is 0%.
- Ref: Table:1 - Edge rate change (below).
-- vsc8531,led-[N]-mode : LED mode. Specify how the LED[N] should behave.
- N depends on the number of LEDs supported by a
- PHY.
- Allowed values are defined in
- "include/dt-bindings/net/mscc-phy-vsc8531.h".
- Default values are VSC8531_LINK_1000_ACTIVITY (1),
- VSC8531_LINK_100_ACTIVITY (2),
- VSC8531_LINK_ACTIVITY (0) and
- VSC8531_DUPLEX_COLLISION (8).
-- load-save-gpios : GPIO used for the load/save operation of the PTP
- hardware clock (PHC).
-
-
-Table: 1 - Edge rate change
-----------------------------------------------------------------|
-| Edge Rate Change (VDDMAC) |
-| |
-| 3300 mV 2500 mV 1800 mV 1500 mV |
-|---------------------------------------------------------------|
-| 0% 0% 0% 0% |
-| (Fastest) (recommended) (recommended) |
-|---------------------------------------------------------------|
-| 2% 3% 5% 6% |
-|---------------------------------------------------------------|
-| 4% 6% 9% 14% |
-|---------------------------------------------------------------|
-| 7% 10% 16% 21% |
-|(recommended) (recommended) |
-|---------------------------------------------------------------|
-| 10% 14% 23% 29% |
-|---------------------------------------------------------------|
-| 17% 23% 35% 42% |
-|---------------------------------------------------------------|
-| 29% 37% 52% 58% |
-|---------------------------------------------------------------|
-| 53% 63% 76% 77% |
-| (slowest) |
-|---------------------------------------------------------------|
-
-Example:
-
- vsc8531_0: ethernet-phy@0 {
- compatible = "ethernet-phy-id0007.0570";
- vsc8531,vddmac = <3300>;
- vsc8531,edge-slowdown = <7>;
- vsc8531,led-0-mode = <VSC8531_LINK_1000_ACTIVITY>;
- vsc8531,led-1-mode = <VSC8531_LINK_100_ACTIVITY>;
- load-save-gpios = <&gpio 10 GPIO_ACTIVE_HIGH>;
- };
diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.yaml b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.yaml
new file mode 100644
index 000000000000..0afbd0ff126f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.yaml
@@ -0,0 +1,131 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/mscc-phy-vsc8531.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Microsemi VSC8531 Gigabit Ethernet PHY
+
+maintainers:
+ - Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
+
+description:
+ The VSC8531 is a Gigabit Ethernet PHY with configurable MAC interface
+ drive strength and LED modes.
+
+allOf:
+ - $ref: ethernet-phy.yaml#
+
+select:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - ethernet-phy-id0007.0570 # VSC8531
+ - ethernet-phy-id0007.0772 # VSC8541
+ required:
+ - compatible
+
+properties:
+ compatible:
+ items:
+ - enum:
+ - ethernet-phy-id0007.0570 # VSC8531
+ - ethernet-phy-id0007.0772 # VSC8541
+ - const: ethernet-phy-ieee802.3-c22
+
+ vsc8531,vddmac:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The VDDMAC voltage in millivolts. This property is used in combination
+ with the edge-slowdown property to control the drive strength of the
+ MAC interface output signals.
+ enum: [3300, 2500, 1800, 1500]
+ default: 3300
+
+ vsc8531,edge-slowdown:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: >
+ Percentage by which the edge rate should be slowed down relative to
+ the fastest possible edge time. This setting helps reduce electromagnetic
+ interference (EMI) by adjusting the drive strength of the MAC interface
+ output signals. Valid values depend on the vddmac voltage setting
+ according to the edge rate change table in the datasheet.
+
+ - When vsc8531,vddmac = 3300 mV: allowed values are 0, 2, 4, 7, 10, 17, 29, and 53.
+ (Recommended: 7)
+ - When vsc8531,vddmac = 2500 mV: allowed values are 0, 3, 6, 10, 14, 23, 37, and 63.
+ (Recommended: 10)
+ - When vsc8531,vddmac = 1800 mV: allowed values are 0, 5, 9, 16, 23, 35, 52, and 76.
+ (Recommended: 0)
+ - When vsc8531,vddmac = 1500 mV: allowed values are 0, 6, 14, 21, 29, 42, 58, and 77.
+ (Recommended: 0)
+ enum: [0, 2, 3, 4, 5, 6, 7, 9, 10, 14, 16, 17, 21, 23, 29, 35, 37, 42, 52, 53, 58, 63, 76, 77]
+ default: 0
+
+ vsc8531,led-0-mode:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: LED[0] behavior mode. See include/dt-bindings/net/mscc-phy-vsc8531.h
+ for available modes.
+ minimum: 0
+ maximum: 15
+ default: 1
+
+ vsc8531,led-1-mode:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: LED[1] behavior mode. See include/dt-bindings/net/mscc-phy-vsc8531.h
+ for available modes.
+ minimum: 0
+ maximum: 15
+ default: 2
+
+ vsc8531,led-2-mode:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: LED[2] behavior mode. See include/dt-bindings/net/mscc-phy-vsc8531.h
+ for available modes.
+ minimum: 0
+ maximum: 15
+ default: 0
+
+ vsc8531,led-3-mode:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description: LED[3] behavior mode. See include/dt-bindings/net/mscc-phy-vsc8531.h
+ for available modes.
+ minimum: 0
+ maximum: 15
+ default: 8
+
+ load-save-gpios:
+ description: GPIO phandle used for the load/save operation of the PTP hardware
+ clock (PHC).
+ maxItems: 1
+
+dependencies:
+ vsc8531,edge-slowdown:
+ - vsc8531,vddmac
+
+required:
+ - compatible
+ - reg
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/gpio/gpio.h>
+ #include <dt-bindings/net/mscc-phy-vsc8531.h>
+
+ mdio {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet-phy@0 {
+ compatible = "ethernet-phy-id0007.0772", "ethernet-phy-ieee802.3-c22";
+ reg = <0>;
+ vsc8531,vddmac = <3300>;
+ vsc8531,edge-slowdown = <7>;
+ vsc8531,led-0-mode = <VSC8531_LINK_1000_ACTIVITY>;
+ vsc8531,led-1-mode = <VSC8531_LINK_100_ACTIVITY>;
+ load-save-gpios = <&gpio 10 GPIO_ACTIVE_HIGH>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml b/Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml
index 97389fd5dbbf..deea4fd73d76 100644
--- a/Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml
+++ b/Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml
@@ -21,6 +21,7 @@ maintainers:
properties:
compatible:
enum:
+ - nxp,imx94-netc-blk-ctrl
- nxp,imx95-netc-blk-ctrl
reg:
diff --git a/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml b/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml
index bb1ee3398655..0b3803f647b7 100644
--- a/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml
+++ b/Documentation/devicetree/bindings/net/pse-pd/ti,tps23881.yaml
@@ -16,6 +16,7 @@ properties:
compatible:
enum:
- ti,tps23881
+ - ti,tps23881b
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/net/qcom,ethqos.yaml b/Documentation/devicetree/bindings/net/qcom,ethqos.yaml
index e7ee0d9efed8..423959cb928d 100644
--- a/Documentation/devicetree/bindings/net/qcom,ethqos.yaml
+++ b/Documentation/devicetree/bindings/net/qcom,ethqos.yaml
@@ -73,6 +73,14 @@ properties:
dma-coherent: true
+ interconnects:
+ maxItems: 2
+
+ interconnect-names:
+ items:
+ - const: cpu-mac
+ - const: mac-mem
+
phys: true
phy-names:
diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
index 0ac7c4b47d6b..d17112527dab 100644
--- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml
@@ -24,6 +24,7 @@ select:
- rockchip,rk3366-gmac
- rockchip,rk3368-gmac
- rockchip,rk3399-gmac
+ - rockchip,rk3506-gmac
- rockchip,rk3528-gmac
- rockchip,rk3568-gmac
- rockchip,rk3576-gmac
@@ -50,6 +51,7 @@ properties:
- rockchip,rv1108-gmac
- items:
- enum:
+ - rockchip,rk3506-gmac
- rockchip,rk3528-gmac
- rockchip,rk3568-gmac
- rockchip,rk3576-gmac
@@ -148,6 +150,7 @@ allOf:
compatible:
contains:
enum:
+ - rockchip,rk3506-gmac
- rockchip,rk3528-gmac
then:
properties:
diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index 658c004e6a5c..dd3c72e8363e 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -86,10 +86,14 @@ properties:
- rockchip,rk3328-gmac
- rockchip,rk3366-gmac
- rockchip,rk3368-gmac
+ - rockchip,rk3399-gmac
+ - rockchip,rk3506-gmac
+ - rockchip,rk3528-gmac
+ - rockchip,rk3568-gmac
- rockchip,rk3576-gmac
- rockchip,rk3588-gmac
- - rockchip,rk3399-gmac
- rockchip,rv1108-gmac
+ - rockchip,rv1126-gmac
- snps,dwmac
- snps,dwmac-3.40a
- snps,dwmac-3.50a
diff --git a/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml b/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml
index ce21979a2d9a..e8d3814db0e9 100644
--- a/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/sophgo,sg2044-dwmac.yaml
@@ -70,6 +70,25 @@ required:
allOf:
- $ref: snps,dwmac.yaml#
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: sophgo,sg2042-dwmac
+ then:
+ properties:
+ phy-mode:
+ enum:
+ - rgmii-rxid
+ - rgmii-id
+ else:
+ properties:
+ phy-mode:
+ enum:
+ - rgmii
+ - rgmii-rxid
+ - rgmii-txid
+ - rgmii-id
unevaluatedProperties: false
diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml
index eabceb849537..ae6b97cdc44b 100644
--- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml
+++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml
@@ -151,6 +151,12 @@ properties:
- ETSI
- JP
+ country:
+ $ref: /schemas/types.yaml#/definitions/string
+ pattern: '^[A-Z]{2}$'
+ description:
+ ISO 3166-1 alpha-2 country code for power limits
+
patternProperties:
"^txpower-[256]g$":
type: object
@@ -210,6 +216,66 @@ properties:
minItems: 13
maxItems: 13
+ paths-cck:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ minItems: 4
+ maxItems: 4
+ description:
+ 4 half-dBm backoff values (1 - 4 antennas, single spacial
+ stream)
+
+ paths-ofdm:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ minItems: 4
+ maxItems: 4
+ description:
+ 4 half-dBm backoff values (1 - 4 antennas, single spacial
+ stream)
+
+ paths-ofdm-bf:
+ $ref: /schemas/types.yaml#/definitions/uint8-array
+ minItems: 4
+ maxItems: 4
+ description:
+ 4 half-dBm backoff values for beamforming
+ (1 - 4 antennas, single spacial stream)
+
+ paths-ru:
+ $ref: /schemas/types.yaml#/definitions/uint8-matrix
+ description:
+ Sets of half-dBm backoff values for 802.11ax rates for
+ 1T1ss (aka 1 transmitting antenna with 1 spacial stream),
+ 2T1ss, 3T1ss, 4T1ss, 2T2ss, 3T2ss, 4T2ss, 3T3ss, 4T3ss
+ and 4T4ss.
+ Each set starts with the number of channel bandwidth or
+ resource unit settings for which the rate set applies,
+ followed by 10 power limit values. The order of the
+ channel resource unit settings is RU26, RU52, RU106,
+ RU242/SU20, RU484/SU40, RU996/SU80 and RU2x996/SU160.
+ minItems: 1
+ maxItems: 7
+ items:
+ minItems: 11
+ maxItems: 11
+
+ paths-ru-bf:
+ $ref: /schemas/types.yaml#/definitions/uint8-matrix
+ description:
+ Sets of half-dBm backoff (beamforming) values for 802.11ax
+ rates for 1T1ss (aka 1 transmitting antenna with 1 spacial
+ stream), 2T1ss, 3T1ss, 4T1ss, 2T2ss, 3T2ss, 4T2ss, 3T3ss,
+ 4T3ss and 4T4ss.
+ Each set starts with the number of channel bandwidth or
+ resource unit settings for which the rate set applies,
+ followed by 10 power limit values. The order of the
+ channel resource unit settings is RU26, RU52, RU106,
+ RU242/SU20, RU484/SU40, RU996/SU80 and RU2x996/SU160.
+ minItems: 1
+ maxItems: 7
+ items:
+ minItems: 11
+ maxItems: 11
+
txs-delta:
$ref: /schemas/types.yaml#/definitions/uint32-array
description:
diff --git a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
index d2e578d6b83b..103e4aec2439 100644
--- a/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
+++ b/Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
@@ -14,6 +14,7 @@ properties:
oneOf:
- enum:
- fsl,imx8-ddr-pmu
+ - fsl,imx8dxl-db-pmu
- fsl,imx8m-ddr-pmu
- fsl,imx8mq-ddr-pmu
- fsl,imx8mm-ddr-pmu
@@ -28,7 +29,10 @@ properties:
- fsl,imx8mp-ddr-pmu
- const: fsl,imx8m-ddr-pmu
- items:
- - const: fsl,imx8dxl-ddr-pmu
+ - enum:
+ - fsl,imx8dxl-ddr-pmu
+ - fsl,imx8qm-ddr-pmu
+ - fsl,imx8qxp-ddr-pmu
- const: fsl,imx8-ddr-pmu
- items:
- enum:
@@ -43,6 +47,14 @@ properties:
interrupts:
maxItems: 1
+ clocks:
+ maxItems: 2
+
+ clock-names:
+ items:
+ - const: ipg
+ - const: cnt
+
required:
- compatible
- reg
@@ -50,6 +62,21 @@ required:
additionalProperties: false
+allOf:
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: fsl,imx8dxl-db-pmu
+ then:
+ required:
+ - clocks
+ - clock-names
+ else:
+ properties:
+ clocks: false
+ clock-names: false
+
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
diff --git a/Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml
index 19d47fd414bc..ce04d2eadec9 100644
--- a/Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/toshiba,visconti-pinctrl.yaml
@@ -50,18 +50,20 @@ patternProperties:
groups:
description:
Name of the pin group to use for the functions.
- $ref: /schemas/types.yaml#/definitions/string
- enum: [i2c0_grp, i2c1_grp, i2c2_grp, i2c3_grp, i2c4_grp,
- i2c5_grp, i2c6_grp, i2c7_grp, i2c8_grp,
- spi0_grp, spi0_cs0_grp, spi0_cs1_grp, spi0_cs2_grp,
- spi1_grp, spi2_grp, spi3_grp, spi4_grp, spi5_grp, spi6_grp,
- uart0_grp, uart1_grp, uart2_grp, uart3_grp,
- pwm0_gpio4_grp, pwm0_gpio8_grp, pwm0_gpio12_grp,
- pwm0_gpio16_grp, pwm1_gpio5_grp, pwm1_gpio9_grp,
- pwm1_gpio13_grp, pwm1_gpio17_grp, pwm2_gpio6_grp,
- pwm2_gpio10_grp, pwm2_gpio14_grp, pwm2_gpio18_grp,
- pwm3_gpio7_grp, pwm3_gpio11_grp, pwm3_gpio15_grp,
- pwm3_gpio19_grp, pcmif_out_grp, pcmif_in_grp]
+ items:
+ enum: [i2c0_grp, i2c1_grp, i2c2_grp, i2c3_grp, i2c4_grp,
+ i2c5_grp, i2c6_grp, i2c7_grp, i2c8_grp,
+ spi0_grp, spi0_cs0_grp, spi0_cs1_grp, spi0_cs2_grp,
+ spi1_grp, spi2_grp, spi3_grp, spi4_grp, spi5_grp, spi6_grp,
+ uart0_grp, uart1_grp, uart2_grp, uart3_grp,
+ pwm0_gpio4_grp, pwm0_gpio8_grp, pwm0_gpio12_grp,
+ pwm0_gpio16_grp, pwm1_gpio5_grp, pwm1_gpio9_grp,
+ pwm1_gpio13_grp, pwm1_gpio17_grp, pwm2_gpio6_grp,
+ pwm2_gpio10_grp, pwm2_gpio14_grp, pwm2_gpio18_grp,
+ pwm3_gpio7_grp, pwm3_gpio11_grp, pwm3_gpio15_grp,
+ pwm3_gpio19_grp, pcmif_out_grp, pcmif_in_grp]
+ minItems: 1
+ maxItems: 8
drive-strength:
enum: [2, 4, 6, 8, 16, 24, 32]
diff --git a/Documentation/devicetree/bindings/pinctrl/xlnx,versal-pinctrl.yaml b/Documentation/devicetree/bindings/pinctrl/xlnx,versal-pinctrl.yaml
index 55ece6a8be5e..81e2164ea98f 100644
--- a/Documentation/devicetree/bindings/pinctrl/xlnx,versal-pinctrl.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/xlnx,versal-pinctrl.yaml
@@ -74,6 +74,7 @@ patternProperties:
'^conf':
type: object
+ unevaluatedProperties: false
description:
Pinctrl node's client devices use subnodes for pin configurations,
which in turn use the standard properties below.
diff --git a/Documentation/devicetree/bindings/rng/microchip,pic32-rng.txt b/Documentation/devicetree/bindings/rng/microchip,pic32-rng.txt
deleted file mode 100644
index c6d1003befb7..000000000000
--- a/Documentation/devicetree/bindings/rng/microchip,pic32-rng.txt
+++ /dev/null
@@ -1,17 +0,0 @@
-* Microchip PIC32 Random Number Generator
-
-The PIC32 RNG provides a pseudo random number generator which can be seeded by
-another true random number generator.
-
-Required properties:
-- compatible : should be "microchip,pic32mzda-rng"
-- reg : Specifies base physical address and size of the registers.
-- clocks: clock phandle.
-
-Example:
-
- rng: rng@1f8e6000 {
- compatible = "microchip,pic32mzda-rng";
- reg = <0x1f8e6000 0x1000>;
- clocks = <&PBCLK5>;
- };
diff --git a/Documentation/devicetree/bindings/rng/microchip,pic32-rng.yaml b/Documentation/devicetree/bindings/rng/microchip,pic32-rng.yaml
new file mode 100644
index 000000000000..1f6f6fb81ddc
--- /dev/null
+++ b/Documentation/devicetree/bindings/rng/microchip,pic32-rng.yaml
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/rng/microchip,pic32-rng.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Microchip PIC32 Random Number Generator
+
+description: |
+ The PIC32 RNG provides a pseudo random number generator which can be seeded
+ by another true random number generator.
+
+maintainers:
+ - Joshua Henderson <joshua.henderson@microchip.com>
+
+properties:
+ compatible:
+ enum:
+ - microchip,pic32mzda-rng
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - clocks
+
+additionalProperties: false
+
+examples:
+ - |
+ rng: rng@1f8e6000 {
+ compatible = "microchip,pic32mzda-rng";
+ reg = <0x1f8e6000 0x1000>;
+ clocks = <&PBCLK5>;
+ };
diff --git a/Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml b/Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml
new file mode 100644
index 000000000000..7fd1a86d7287
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml
@@ -0,0 +1,87 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/thermal/fsl,imx91-tmu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP i.MX91 Thermal
+
+maintainers:
+ - Pengfei Li <pengfei.li_1@nxp.com>
+
+description:
+ i.MX91 features a new temperature sensor. It includes programmable
+ temperature threshold comparators for both normal and privileged
+ accesses and allows a programmable measurement frequency for the
+ Periodic One-Shot Measurement mode. Additionally, it provides
+ status registers for indicating the end of measurement and threshold
+ violation events.
+
+properties:
+ compatible:
+ items:
+ - const: fsl,imx91-tmu
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ maxItems: 1
+
+ interrupts:
+ items:
+ - description: Comparator 1 irq
+ - description: Comparator 2 irq
+ - description: Data ready irq
+
+ interrupt-names:
+ items:
+ - const: thr1
+ - const: thr2
+ - const: ready
+
+ nvmem-cells:
+ items:
+ - description: Phandle to the trim control 1 provided by ocotp
+ - description: Phandle to the trim control 2 provided by ocotp
+
+ nvmem-cell-names:
+ items:
+ - const: trim1
+ - const: trim2
+
+ "#thermal-sensor-cells":
+ const: 0
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - interrupts
+ - interrupt-names
+
+allOf:
+ - $ref: thermal-sensor.yaml
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/clock/imx93-clock.h>
+
+ thermal-sensor@44482000 {
+ compatible = "fsl,imx91-tmu";
+ reg = <0x44482000 0x1000>;
+ #thermal-sensor-cells = <0>;
+ clocks = <&clk IMX93_CLK_TMC_GATE>;
+ interrupt-parent = <&gic>;
+ interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "thr1", "thr2", "ready";
+ nvmem-cells = <&tmu_trim1>, <&tmu_trim2>;
+ nvmem-cell-names = "trim1", "trim2";
+ };
+
+...
diff --git a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
index 78e2f6573b96..3c5256b0cd9f 100644
--- a/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
+++ b/Documentation/devicetree/bindings/thermal/qcom-tsens.yaml
@@ -36,10 +36,15 @@ properties:
- qcom,msm8974-tsens
- const: qcom,tsens-v0_1
+ - description:
+ v1 of TSENS without RPM which requires to be explicitly reset
+ and enabled in the driver.
+ enum:
+ - qcom,ipq5018-tsens
+
- description: v1 of TSENS
items:
- enum:
- - qcom,ipq5018-tsens
- qcom,msm8937-tsens
- qcom,msm8956-tsens
- qcom,msm8976-tsens
@@ -50,11 +55,13 @@ properties:
items:
- enum:
- qcom,glymur-tsens
+ - qcom,kaanapali-tsens
- qcom,milos-tsens
- qcom,msm8953-tsens
- qcom,msm8996-tsens
- qcom,msm8998-tsens
- qcom,qcm2290-tsens
+ - qcom,qcs8300-tsens
- qcom,qcs615-tsens
- qcom,sa8255p-tsens
- qcom,sa8775p-tsens
diff --git a/Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml b/Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml
index 8d3f3c24f0f2..befdc8b7a082 100644
--- a/Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml
+++ b/Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml
@@ -16,7 +16,11 @@ description:
properties:
compatible:
- const: renesas,r9a09g047-tsu
+ oneOf:
+ - const: renesas,r9a09g047-tsu # RZ/G3E
+ - items:
+ - const: renesas,r9a09g057-tsu # RZ/V2H
+ - const: renesas,r9a09g047-tsu # RZ/G3E
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml b/Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml
new file mode 100644
index 000000000000..e08d3d2d306b
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml
@@ -0,0 +1,47 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/realtek,rtd1625-systimer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek System Timer
+
+maintainers:
+ - Hao-Wen Ting <haowen.ting@realtek.com>
+
+description:
+ The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating
+ at a fixed 1MHz frequency. Thanks to its compare match interrupt capability,
+ the timer natively supports oneshot mode for tick broadcast functionality.
+
+properties:
+ compatible:
+ oneOf:
+ - const: realtek,rtd1625-systimer
+ - items:
+ - const: realtek,rtd1635-systimer
+ - const: realtek,rtd1625-systimer
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ timer@89420 {
+ compatible = "realtek,rtd1635-systimer",
+ "realtek,rtd1625-systimer";
+ reg = <0x89420 0x18>;
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH>;
+ };
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index f1d1882009ba..beab81e462d2 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -20,7 +20,7 @@ patternProperties:
"^(keypad|m25p|max8952|max8997|max8998|mpmc),.*": true
"^(pciclass|pinctrl-single|#pinctrl-single|PowerPC),.*": true
"^(pl022|pxa-mmc|rcar_sound|rotary-encoder|s5m8767|sdhci),.*": true
- "^(simple-audio-card|st-plgpio|st-spics|ts),.*": true
+ "^(simple-audio-card|st-plgpio|st-spics|ts|vsc8531),.*": true
"^pool[0-3],.*": true
# Keep list in alphabetical order.
@@ -1705,6 +1705,8 @@ patternProperties:
description: Universal Scientific Industrial Co., Ltd.
"^usr,.*":
description: U.S. Robotics Corporation
+ "^ultrarisc,.*":
+ description: UltraRISC Technology Co., Ltd.
"^ultratronik,.*":
description: Ultratronik GmbH
"^utoo,.*":
diff --git a/Documentation/doc-guide/checktransupdate.rst b/Documentation/doc-guide/checktransupdate.rst
index dfaf9d373747..7b25375cc6d9 100644
--- a/Documentation/doc-guide/checktransupdate.rst
+++ b/Documentation/doc-guide/checktransupdate.rst
@@ -27,15 +27,15 @@ Usage
::
- ./scripts/checktransupdate.py --help
+ tools/docs/checktransupdate.py --help
Please refer to the output of argument parser for usage details.
Samples
-- ``./scripts/checktransupdate.py -l zh_CN``
+- ``tools/docs/checktransupdate.py -l zh_CN``
This will print all the files that need to be updated in the zh_CN locale.
-- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
+- ``tools/docs/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
This will only print the status of the specified file.
Then the output is something like:
diff --git a/Documentation/doc-guide/contributing.rst b/Documentation/doc-guide/contributing.rst
index 662c7a840cd5..f8047e633113 100644
--- a/Documentation/doc-guide/contributing.rst
+++ b/Documentation/doc-guide/contributing.rst
@@ -152,7 +152,7 @@ generate links to that documentation. Adding ``kernel-doc`` directives to
the documentation to bring those comments in can help the community derive
the full value of the work that has gone into creating them.
-The ``scripts/find-unused-docs.sh`` tool can be used to find these
+The ``tools/docs/find-unused-docs.sh`` tool can be used to find these
overlooked comments.
Note that the most value comes from pulling in the documentation for
diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst
index af9697e60165..fd89a6d56ea9 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -405,6 +405,10 @@ Domain`_ references.
``%CONST``
Name of a constant. (No cross-referencing, just formatting.)
+ Examples::
+
+ %0 %NULL %-1 %-EFAULT %-EINVAL %-ENOMEM
+
````literal````
A literal block that should be handled as-is. The output will use a
``monospaced font``.
@@ -579,20 +583,23 @@ source.
How to use kernel-doc to generate man pages
-------------------------------------------
-If you just want to use kernel-doc to generate man pages you can do this
-from the kernel git tree::
+To generate man pages for all files that contain kernel-doc markups, run::
+
+ $ make mandocs
- $ scripts/kernel-doc -man \
- $(git grep -l '/\*\*' -- :^Documentation :^tools) \
- | scripts/split-man.pl /tmp/man
+Or calling ``script-build-wrapper`` directly::
-Some older versions of git do not support some of the variants of syntax for
-path exclusion. One of the following commands may work for those versions::
+ $ ./tools/docs/sphinx-build-wrapper mandocs
- $ scripts/kernel-doc -man \
- $(git grep -l '/\*\*' -- . ':!Documentation' ':!tools') \
- | scripts/split-man.pl /tmp/man
+The output will be at ``/man`` directory inside the output directory
+(by default: ``Documentation/output``).
+
+Optionally, it is possible to generate a partial set of man pages by
+using SPHINXDIRS:
+
+ $ make SPHINXDIRS=driver-api/media mandocs
+
+.. note::
- $ scripts/kernel-doc -man \
- $(git grep -l '/\*\*' -- . ":(exclude)Documentation" ":(exclude)tools") \
- | scripts/split-man.pl /tmp/man
+ When SPHINXDIRS={subdir} is used, it will only generate man pages for
+ the files explicitly inside a ``Documentation/{subdir}/.../*.rst`` file.
diff --git a/Documentation/doc-guide/parse-headers.rst b/Documentation/doc-guide/parse-headers.rst
index 204b025f1349..a7bb01ff04eb 100644
--- a/Documentation/doc-guide/parse-headers.rst
+++ b/Documentation/doc-guide/parse-headers.rst
@@ -5,173 +5,168 @@ Including uAPI header files
Sometimes, it is useful to include header files and C example codes in
order to describe the userspace API and to generate cross-references
between the code and the documentation. Adding cross-references for
-userspace API files has an additional vantage: Sphinx will generate warnings
+userspace API files has an additional advantage: Sphinx will generate warnings
if a symbol is not found at the documentation. That helps to keep the
uAPI documentation in sync with the Kernel changes.
-The :ref:`parse_headers.pl <parse_headers>` provide a way to generate such
+The :ref:`parse_headers.py <parse_headers>` provides a way to generate such
cross-references. It has to be called via Makefile, while building the
documentation. Please see ``Documentation/userspace-api/media/Makefile`` for an example
about how to use it inside the Kernel tree.
.. _parse_headers:
-parse_headers.pl
-^^^^^^^^^^^^^^^^
+tools/docs/parse_headers.py
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
NAME
****
-
-parse_headers.pl - parse a C file, in order to identify functions, structs,
+parse_headers.py - parse a C file, in order to identify functions, structs,
enums and defines and create cross-references to a Sphinx book.
+USAGE
+*****
+
+parse-headers.py [-h] [-d] [-t] ``FILE_IN`` ``FILE_OUT`` ``FILE_RULES``
SYNOPSIS
********
-
-\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
-
-Where <options> can be: --debug, --help or --usage.
-
-
-OPTIONS
-*******
-
-
-
-\ **--debug**\
-
- Put the script in verbose mode, useful for debugging.
-
-
-
-\ **--usage**\
-
- Prints a brief help message and exits.
-
-
-
-\ **--help**\
-
- Prints a more detailed help message and exits.
-
-
-DESCRIPTION
-***********
-
-
-Convert a C header or source file (C_FILE), into a reStructuredText
+Converts a C header or source file ``FILE_IN`` into a ReStructured Text
included via ..parsed-literal block with cross-references for the
documentation files that describe the API. It accepts an optional
-EXCEPTIONS_FILE with describes what elements will be either ignored or
-be pointed to a non-default reference.
-
-The output is written at the (OUT_FILE).
+``FILE_RULES`` file to describe what elements will be either ignored or
+be pointed to a non-default reference type/name.
-It is capable of identifying defines, functions, structs, typedefs,
-enums and enum symbols and create cross-references for all of them.
-It is also capable of distinguish #define used for specifying a Linux
-ioctl.
+The output is written at ``FILE_OUT``.
-The EXCEPTIONS_FILE contain two types of statements: \ **ignore**\ or \ **replace**\ .
+It is capable of identifying ``define``, ``struct``, ``typedef``, ``enum``
+and enum ``symbol``, creating cross-references for all of them.
-The syntax for the ignore tag is:
+It is also capable of distinguishing ``#define`` used for specifying
+Linux-specific macros used to define ``ioctl``.
+The optional ``FILE_RULES`` contains a set of rules like::
-ignore \ **type**\ \ **name**\
+ ignore ioctl VIDIOC_ENUM_FMT
+ replace ioctl VIDIOC_DQBUF vidioc_qbuf
+ replace define V4L2_EVENT_MD_FL_HAVE_FRAME_SEQ :c:type:`v4l2_event_motion_det`
-The \ **ignore**\ means that it won't generate cross references for a
-\ **name**\ symbol of type \ **type**\ .
+POSITIONAL ARGUMENTS
+********************
-The syntax for the replace tag is:
+ ``FILE_IN``
+ Input C file
+ ``FILE_OUT``
+ Output RST file
-replace \ **type**\ \ **name**\ \ **new_value**\
+ ``FILE_RULES``
+ Exceptions file (optional)
-The \ **replace**\ means that it will generate cross references for a
-\ **name**\ symbol of type \ **type**\ , but, instead of using the default
-replacement rule, it will use \ **new_value**\ .
-
-For both statements, \ **type**\ can be either one of the following:
+OPTIONS
+*******
+ ``-h``, ``--help``
+ show a help message and exit
+ ``-d``, ``--debug``
+ Increase debug level. Can be used multiple times
+ ``-t``, ``--toc``
+ instead of a literal block, outputs a TOC table at the RST file
-\ **ioctl**\
- The ignore or replace statement will apply to ioctl definitions like:
+DESCRIPTION
+***********
- #define VIDIOC_DBG_S_REGISTER _IOW('V', 79, struct v4l2_dbg_register)
+Creates an enriched version of a Kernel header file with cross-links
+to each C data structure type, from ``FILE_IN``, formatting it with
+reStructuredText notation, either as-is or as a table of contents.
+It accepts an optional ``FILE_RULES`` which describes what elements will be
+either ignored or be pointed to a non-default reference, and optionally
+defines the C namespace to be used.
+It is meant to allow having more comprehensive documentation, where
+uAPI headers will create cross-reference links to the code.
-\ **define**\
+The output is written at the ``FILE_OUT``.
- The ignore or replace statement will apply to any other #define found
- at C_FILE.
+The ``FILE_RULES`` may contain contain three types of statements:
+**ignore**, **replace** and **namespace**.
+By default, it create rules for all symbols and defines, but it also
+allows parsing an exception file. Such file contains a set of rules
+using the syntax below:
+1. Ignore rules:
-\ **typedef**\
+ ignore *type* *symbol*
- The ignore or replace statement will apply to typedef statements at C_FILE.
+Removes the symbol from reference generation.
+2. Replace rules:
+ replace *type* *old_symbol* *new_reference*
-\ **struct**\
+ Replaces *old_symbol* with a *new_reference*.
+ The *new_reference* can be:
- The ignore or replace statement will apply to the name of struct statements
- at C_FILE.
+ - A simple symbol name;
+ - A full Sphinx reference.
+3. Namespace rules
+ namespace *namespace*
-\ **enum**\
+ Sets C *namespace* to be used during cross-reference generation. Can
+ be overridden by replace rules.
- The ignore or replace statement will apply to the name of enum statements
- at C_FILE.
+On ignore and replace rules, *type* can be:
+ - ioctl:
+ for defines of the form ``_IO*``, e.g., ioctl definitions
+ - define:
+ for other defines
-\ **symbol**\
+ - symbol:
+ for symbols defined within enums;
- The ignore or replace statement will apply to the name of enum value
- at C_FILE.
+ - typedef:
+ for typedefs;
- For replace statements, \ **new_value**\ will automatically use :c:type:
- references for \ **typedef**\ , \ **enum**\ and \ **struct**\ types. It will use :ref:
- for \ **ioctl**\ , \ **define**\ and \ **symbol**\ types. The type of reference can
- also be explicitly defined at the replace statement.
+ - enum:
+ for the name of a non-anonymous enum;
+ - struct:
+ for structs.
EXAMPLES
********
+- Ignore a define ``_VIDEODEV2_H`` at ``FILE_IN``::
-ignore define _VIDEODEV2_H
-
-
-Ignore a #define _VIDEODEV2_H at the C_FILE.
-
-ignore symbol PRIVATE
-
+ ignore define _VIDEODEV2_H
-On a struct like:
+- On an data structure like this enum::
-enum foo { BAR1, BAR2, PRIVATE };
+ enum foo { BAR1, BAR2, PRIVATE };
-It won't generate cross-references for \ **PRIVATE**\ .
+ It won't generate cross-references for ``PRIVATE``::
-replace symbol BAR1 :c:type:\`foo\`
-replace symbol BAR2 :c:type:\`foo\`
+ ignore symbol PRIVATE
+ At the same struct, instead of creating one cross reference per symbol,
+ make them all point to the ``enum foo`` C type::
-On a struct like:
+ replace symbol BAR1 :c:type:\`foo\`
+ replace symbol BAR2 :c:type:\`foo\`
-enum foo { BAR1, BAR2, PRIVATE };
-It will make the BAR1 and BAR2 enum symbols to cross reference the foo
-symbol at the C domain.
+- Use C namespace ``MC`` for all symbols at ``FILE_IN``::
+ namespace MC
BUGS
****
@@ -184,7 +179,7 @@ COPYRIGHT
*********
-Copyright (c) 2016 by Mauro Carvalho Chehab <mchehab+samsung@kernel.org>.
+Copyright (c) 2016, 2025 by Mauro Carvalho Chehab <mchehab+huawei@kernel.org>.
License GPLv2: GNU GPL version 2 <https://gnu.org/licenses/gpl.html>.
diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index 607589592bfb..51c370260f3b 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -106,7 +106,7 @@ There's a script that automatically checks for Sphinx dependencies. If it can
recognize your distribution, it will also give a hint about the install
command line options for your distro::
- $ ./scripts/sphinx-pre-install
+ $ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -116,7 +116,7 @@ command line options for your distro::
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
- Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
+ Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
By default, it checks all the requirements for both html and PDF, including
the requirements for images, math expressions and LaTeX build, and assumes
@@ -149,7 +149,7 @@ a venv with it with, and install minimal requirements with::
A more comprehensive test can be done by using:
- scripts/test_doc_build.py
+ tools/docs/test_doc_build.py
Such script create one Python venv per supported version,
optionally building documentation for a range of Sphinx versions.
diff --git a/Documentation/driver-api/dpll.rst b/Documentation/driver-api/dpll.rst
index be1fc643b645..83118c728ed9 100644
--- a/Documentation/driver-api/dpll.rst
+++ b/Documentation/driver-api/dpll.rst
@@ -198,26 +198,28 @@ be requested with the same attribute with ``DPLL_CMD_DEVICE_SET`` command.
================================== ======================================
Device may also provide ability to adjust a signal phase on a pin.
-If pin phase adjustment is supported, minimal and maximal values that pin
-handle shall be provide to the user on ``DPLL_CMD_PIN_GET`` respond
-with ``DPLL_A_PIN_PHASE_ADJUST_MIN`` and ``DPLL_A_PIN_PHASE_ADJUST_MAX``
+If pin phase adjustment is supported, minimal and maximal values and
+granularity that pin handle shall be provided to the user on
+``DPLL_CMD_PIN_GET`` respond with ``DPLL_A_PIN_PHASE_ADJUST_MIN``,
+``DPLL_A_PIN_PHASE_ADJUST_MAX`` and ``DPLL_A_PIN_PHASE_ADJUST_GRAN``
attributes. Configured phase adjust value is provided with
``DPLL_A_PIN_PHASE_ADJUST`` attribute of a pin, and value change can be
requested with the same attribute with ``DPLL_CMD_PIN_SET`` command.
- =============================== ======================================
- ``DPLL_A_PIN_ID`` configured pin id
- ``DPLL_A_PIN_PHASE_ADJUST_MIN`` attr minimum value of phase adjustment
- ``DPLL_A_PIN_PHASE_ADJUST_MAX`` attr maximum value of phase adjustment
- ``DPLL_A_PIN_PHASE_ADJUST`` attr configured value of phase
- adjustment on parent dpll device
- ``DPLL_A_PIN_PARENT_DEVICE`` nested attribute for requesting
- configuration on given parent dpll
- device
- ``DPLL_A_PIN_PARENT_ID`` parent dpll device id
- ``DPLL_A_PIN_PHASE_OFFSET`` attr measured phase difference
- between a pin and parent dpll device
- =============================== ======================================
+ ================================ ==========================================
+ ``DPLL_A_PIN_ID`` configured pin id
+ ``DPLL_A_PIN_PHASE_ADJUST_GRAN`` attr granularity of phase adjustment value
+ ``DPLL_A_PIN_PHASE_ADJUST_MIN`` attr minimum value of phase adjustment
+ ``DPLL_A_PIN_PHASE_ADJUST_MAX`` attr maximum value of phase adjustment
+ ``DPLL_A_PIN_PHASE_ADJUST`` attr configured value of phase
+ adjustment on parent dpll device
+ ``DPLL_A_PIN_PARENT_DEVICE`` nested attribute for requesting
+ configuration on given parent dpll
+ device
+ ``DPLL_A_PIN_PARENT_ID`` parent dpll device id
+ ``DPLL_A_PIN_PHASE_OFFSET`` attr measured phase difference
+ between a pin and parent dpll device
+ ================================ ==========================================
All phase related values are provided in pico seconds, which represents
time difference between signals phase. The negative value means that
@@ -384,6 +386,8 @@ according to attribute purpose.
frequencies
``DPLL_A_PIN_ANY_FREQUENCY_MIN`` attr minimum value of frequency
``DPLL_A_PIN_ANY_FREQUENCY_MAX`` attr maximum value of frequency
+ ``DPLL_A_PIN_PHASE_ADJUST_GRAN`` attr granularity of phase
+ adjustment value
``DPLL_A_PIN_PHASE_ADJUST_MIN`` attr minimum value of phase
adjustment
``DPLL_A_PIN_PHASE_ADJUST_MAX`` attr maximum value of phase
diff --git a/Documentation/driver-api/parport-lowlevel.rst b/Documentation/driver-api/parport-lowlevel.rst
index 0633d70ffda7..a907e279f509 100644
--- a/Documentation/driver-api/parport-lowlevel.rst
+++ b/Documentation/driver-api/parport-lowlevel.rst
@@ -7,6 +7,7 @@ PARPORT interface documentation
Described here are the following functions:
Global functions::
+
parport_register_driver
parport_unregister_driver
parport_enumerate
@@ -34,6 +35,7 @@ Global functions::
Port functions (can be overridden by low-level drivers):
SPP::
+
port->ops->read_data
port->ops->write_data
port->ops->read_status
@@ -46,17 +48,20 @@ Port functions (can be overridden by low-level drivers):
port->ops->data_reverse
EPP::
+
port->ops->epp_write_data
port->ops->epp_read_data
port->ops->epp_write_addr
port->ops->epp_read_addr
ECP::
+
port->ops->ecp_write_data
port->ops->ecp_read_data
port->ops->ecp_write_addr
Other::
+
port->ops->nibble_read_data
port->ops->byte_read_data
port->ops->compat_write_data
diff --git a/Documentation/driver-api/pldmfw/index.rst b/Documentation/driver-api/pldmfw/index.rst
index fd871b83f34f..e59beca374c1 100644
--- a/Documentation/driver-api/pldmfw/index.rst
+++ b/Documentation/driver-api/pldmfw/index.rst
@@ -14,7 +14,6 @@ the PLDM for Firmware Update standard
file-format
driver-ops
-==================================
Overview of the ``pldmfw`` library
==================================
diff --git a/Documentation/driver-api/thermal/intel_dptf.rst b/Documentation/driver-api/thermal/intel_dptf.rst
index c51ac793dc06..916bf0f36a03 100644
--- a/Documentation/driver-api/thermal/intel_dptf.rst
+++ b/Documentation/driver-api/thermal/intel_dptf.rst
@@ -409,3 +409,26 @@ based on the processor generation.
Limit 1 from being exhausted.
4 – Unknown: Can't classify.
+
+ On processors starting from Panther Lake additional hints are provided.
+ The hardware analyzes workload residencies over an extended period to
+ determine whether the workload classification tends toward idle/battery
+ life states or sustained/performance states. Based on this long-term
+ analysis, it classifies:
+
+ Power Classification: If the workload exhibits more idle or battery life
+ residencies, it is classified as "power".
+
+ Performance Classification: If the workload exhibits more sustained or
+ performance residencies, it is classified as "performance".
+
+ This approach enables applications to ignore short-term workload
+ fluctuations and instead respond to longer-term power vs. performance
+ trends.
+
+ Residency thresholds for this classification are CPU generation-specific.
+ Classification is reported via bit 4 of the workload_type_index:
+
+ Bit 4 = 1: Power classification
+
+ Bit 4 = 0: Performance classification
diff --git a/Documentation/driver-api/usb/writing_musb_glue_layer.rst b/Documentation/driver-api/usb/writing_musb_glue_layer.rst
index 0bb96ecdf527..b748b9fb1965 100644
--- a/Documentation/driver-api/usb/writing_musb_glue_layer.rst
+++ b/Documentation/driver-api/usb/writing_musb_glue_layer.rst
@@ -709,7 +709,7 @@ Resources
USB Home Page: https://www.usb.org
-linux-usb Mailing List Archives: https://marc.info/?l=linux-usb
+linux-usb Mailing List Archives: https://lore.kernel.org/linux-usb
USB On-the-Go Basics:
https://www.maximintegrated.com/app-notes/index.mvp/id/1822
diff --git a/Documentation/features/list-arch.sh b/Documentation/features/list-arch.sh
deleted file mode 100755
index ac8ff7f6f859..000000000000
--- a/Documentation/features/list-arch.sh
+++ /dev/null
@@ -1,11 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-#
-# Small script that visualizes the kernel feature support status
-# of an architecture.
-#
-# (If no arguments are given then it will print the host architecture's status.)
-#
-
-ARCH=${1:-$(uname -m | sed 's/x86_64/x86/' | sed 's/i386/x86/' | sed 's/s390x/s390/')}
-
-$(dirname $0)/../../scripts/get_feat.pl list --arch $ARCH
diff --git a/Documentation/features/scripts/features-refresh.sh b/Documentation/features/scripts/features-refresh.sh
deleted file mode 100755
index c2288124e94a..000000000000
--- a/Documentation/features/scripts/features-refresh.sh
+++ /dev/null
@@ -1,98 +0,0 @@
-#
-# Small script that refreshes the kernel feature support status in place.
-#
-
-for F_FILE in Documentation/features/*/*/arch-support.txt; do
- F=$(grep "^# Kconfig:" "$F_FILE" | cut -c26-)
-
- #
- # Each feature F is identified by a pair (O, K), where 'O' can
- # be either the empty string (for 'nop') or "not" (the logical
- # negation operator '!'); other operators are not supported.
- #
- O=""
- K=$F
- if [[ "$F" == !* ]]; then
- O="not"
- K=$(echo $F | sed -e 's/^!//g')
- fi
-
- #
- # F := (O, K) is 'valid' iff there is a Kconfig file (for some
- # arch) which contains K.
- #
- # Notice that this definition entails an 'asymmetry' between
- # the case 'O = ""' and the case 'O = "not"'. E.g., F may be
- # _invalid_ if:
- #
- # [case 'O = ""']
- # 1) no arch provides support for F,
- # 2) K does not exist (e.g., it was renamed/mis-typed);
- #
- # [case 'O = "not"']
- # 3) all archs provide support for F,
- # 4) as in (2).
- #
- # The rationale for adopting this definition (and, thus, for
- # keeping the asymmetry) is:
- #
- # We want to be able to 'detect' (2) (or (4)).
- #
- # (1) and (3) may further warn the developers about the fact
- # that K can be removed.
- #
- F_VALID="false"
- for ARCH_DIR in arch/*/; do
- K_FILES=$(find $ARCH_DIR -name "Kconfig*")
- K_GREP=$(grep "$K" $K_FILES)
- if [ ! -z "$K_GREP" ]; then
- F_VALID="true"
- break
- fi
- done
- if [ "$F_VALID" = "false" ]; then
- printf "WARNING: '%s' is not a valid Kconfig\n" "$F"
- fi
-
- T_FILE="$F_FILE.tmp"
- grep "^#" $F_FILE > $T_FILE
- echo " -----------------------" >> $T_FILE
- echo " | arch |status|" >> $T_FILE
- echo " -----------------------" >> $T_FILE
- for ARCH_DIR in arch/*/; do
- ARCH=$(echo $ARCH_DIR | sed -e 's/^arch//g' | sed -e 's/\///g')
- K_FILES=$(find $ARCH_DIR -name "Kconfig*")
- K_GREP=$(grep "$K" $K_FILES)
- #
- # Arch support status values for (O, K) are updated according
- # to the following rules.
- #
- # - ("", K) is 'supported by a given arch', if there is a
- # Kconfig file for that arch which contains K;
- #
- # - ("not", K) is 'supported by a given arch', if there is
- # no Kconfig file for that arch which contains K;
- #
- # - otherwise: preserve the previous status value (if any),
- # default to 'not yet supported'.
- #
- # Notice that, according these rules, invalid features may be
- # updated/modified.
- #
- if [ "$O" = "" ] && [ ! -z "$K_GREP" ]; then
- printf " |%12s: | ok |\n" "$ARCH" >> $T_FILE
- elif [ "$O" = "not" ] && [ -z "$K_GREP" ]; then
- printf " |%12s: | ok |\n" "$ARCH" >> $T_FILE
- else
- S=$(grep -v "^#" "$F_FILE" | grep " $ARCH:")
- if [ ! -z "$S" ]; then
- echo "$S" >> $T_FILE
- else
- printf " |%12s: | TODO |\n" "$ARCH" \
- >> $T_FILE
- fi
- fi
- done
- echo " -----------------------" >> $T_FILE
- mv $T_FILE $F_FILE
-done
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index cfc6c1659931..55cd5c380e92 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -297,6 +297,8 @@ The ``i_flags`` field is a combination of these values:
- Inode has inline data (EXT4_INLINE_DATA_FL).
* - 0x20000000
- Create children with the same project ID (EXT4_PROJINHERIT_FL).
+ * - 0x40000000
+ - Use case-insensitive lookups for directory contents (EXT4_CASEFOLD_FL).
* - 0x80000000
- Reserved for ext4 library (EXT4_RESERVED_FL).
* -
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 1b240661bfa3..9a59cded9bd7 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -671,7 +671,9 @@ following:
* - 0x8000
- Data in inode (INCOMPAT_INLINE_DATA).
* - 0x10000
- - Encrypted inodes are present on the filesystem. (INCOMPAT_ENCRYPT).
+ - Encrypted inodes can be present. (INCOMPAT_ENCRYPT).
+ * - 0x20000
+ - Directories can be marked case-insensitive. (INCOMPAT_CASEFOLD).
.. _super_rocompat:
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 696a5844bfa3..70af896822e1 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -450,9 +450,7 @@ API, but the filenames mode still does.
- CONFIG_CRYPTO_HCTR2
- Recommended:
- arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK
- - arm64: CONFIG_CRYPTO_POLYVAL_ARM64_CE
- x86: CONFIG_CRYPTO_AES_NI_INTEL
- - x86: CONFIG_CRYPTO_POLYVAL_CLMUL_NI
- Adiantum
- Mandatory:
diff --git a/Documentation/filesystems/gfs2-glocks.rst b/Documentation/filesystems/gfs2/glocks.rst
index ce5ff08cbd59..ce5ff08cbd59 100644
--- a/Documentation/filesystems/gfs2-glocks.rst
+++ b/Documentation/filesystems/gfs2/glocks.rst
diff --git a/Documentation/filesystems/gfs2.rst b/Documentation/filesystems/gfs2/index.rst
index 1bc48a13430c..e5e195403561 100644
--- a/Documentation/filesystems/gfs2.rst
+++ b/Documentation/filesystems/gfs2/index.rst
@@ -4,6 +4,9 @@
Global File System 2
====================
+Overview
+========
+
GFS2 is a cluster file system. It allows a cluster of computers to
simultaneously use a block device that is shared between them (with FC,
iSCSI, NBD, etc). GFS2 reads and writes to the block device like a local
@@ -50,3 +53,12 @@ The following man pages are available from gfs2-utils:
gfs2_convert to convert a gfs filesystem to GFS2 in-place
mkfs.gfs2 to make a filesystem
============ =============================================
+
+Implementation Notes
+====================
+
+.. toctree::
+ :maxdepth: 1
+
+ glocks
+ uevents
diff --git a/Documentation/filesystems/gfs2-uevents.rst b/Documentation/filesystems/gfs2/uevents.rst
index f162a2c76c69..f162a2c76c69 100644
--- a/Documentation/filesystems/gfs2-uevents.rst
+++ b/Documentation/filesystems/gfs2/uevents.rst
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index af516e528ded..f4873197587d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,9 +89,7 @@ Documentation for filesystem implementations.
ext3
ext4/index
f2fs
- gfs2
- gfs2-uevents
- gfs2-glocks
+ gfs2/index
hfs
hfsplus
hpfs
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 387fd9cc72ca..da982ca7e413 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -135,6 +135,27 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
+``struct iomap_read_ops``
+--------------------------
+
+.. code-block:: c
+
+ struct iomap_read_ops {
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t len);
+ void (*submit_read)(struct iomap_read_folio_ctx *ctx);
+ };
+
+iomap calls these functions:
+
+ - ``read_folio_range``: Called to read in the range. This must be provided
+ by the caller. If this succeeds, iomap_finish_folio_read() must be called
+ after the range is read in, regardless of whether the read succeeded or
+ failed.
+
+ - ``submit_read``: Submit any pending read requests. This function is
+ optional.
+
Internal per-Folio State
------------------------
@@ -182,6 +203,28 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
The pagecache takes whatever locks it needs before calling the
filesystem.
+Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
+iomap_read_folio_ctx``:
+
+.. code-block:: c
+
+ struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *read_ctx;
+ };
+
+``iomap_readahead`` must set:
+ * ``ops->read_folio_range()`` and ``rac``
+
+``iomap_read_folio`` must set:
+ * ``ops->read_folio_range()`` and ``cur_folio``
+
+``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
+pass in any custom data the caller needs accessible in the ops callbacks for
+fulfilling reads.
+
Buffered Writes
---------------
@@ -317,6 +360,9 @@ The fields are as follows:
delalloc reservations to avoid having delalloc reservations for
clean pagecache.
This function must be supplied by the filesystem.
+ If this succeeds, iomap_finish_folio_write() must be called once writeback
+ completes for the range, regardless of whether the writeback succeeded or
+ failed.
- ``writeback_submit``: Submit the previous built writeback context.
Block based file systems should use the iomap_ioend_writeback_submit
@@ -444,10 +490,6 @@ These ``struct kiocb`` flags are significant for direct I/O with iomap:
Only meaningful for asynchronous I/O, and only if the entire I/O can
be issued as a single ``struct bio``.
- * ``IOCB_DIO_CALLER_COMP``: Try to run I/O completion from the caller's
- process context.
- See ``linux/fs.h`` for more details.
-
Filesystems should call ``iomap_dio_rw`` from ``->read_iter`` and
``->write_iter``, and set ``FMODE_CAN_ODIRECT`` in the ``->open``
function for the file.
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 7233b04668fc..d33429294252 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -211,7 +211,7 @@ test and set for you.
e.g.::
inode = iget_locked(sb, ino);
- if (inode->i_state & I_NEW) {
+ if (inode_state_read_once(inode) & I_NEW) {
err = read_inode_from_disk(inode);
if (err < 0) {
iget_failed(inode);
@@ -1309,3 +1309,16 @@ a different length, use
vfs_parse_fs_qstr(fc, key, &QSTR_LEN(value, len))
instead.
+
+---
+
+**mandatory**
+
+vfs_mkdir() now returns a dentry - the one returned by ->mkdir(). If
+that dentry is different from the dentry passed in, including if it is
+an IS_ERR() dentry pointer, the original dentry is dput().
+
+When vfs_mkdir() returns an error, and so both dputs() the original
+dentry and doesn't provide a replacement, it also unlocks the parent.
+Consequently the return value from vfs_mkdir() can be passed to
+end_creating() and the parent will be unlocked precisely when necessary.
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
index fa4f81099cb4..a9d271e171c3 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.rst
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
@@ -290,11 +290,11 @@ Why cpio rather than tar?
This decision was made back in December, 2001. The discussion started here:
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
+- https://lore.kernel.org/lkml/a03cke$640$1@cesium.transmeta.com/
And spawned a second thread (specifically on tar vs cpio), starting here:
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
+- https://lore.kernel.org/lkml/3C25A06D.7030408@zytor.com/
The quick and dirty summary version (which is no substitute for reading
the above threads) is:
@@ -310,7 +310,7 @@ the above threads) is:
either way about the archive format, and there are alternative tools,
such as:
- http://freecode.com/projects/afio
+ https://linux.die.net/man/1/afio
2) The cpio archive format chosen by the kernel is simpler and cleaner (and
thus easier to create and parse) than any of the (literally dozens of)
@@ -331,12 +331,12 @@ the above threads) is:
5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
supported on the kernel side"):
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222109050.21702-100000@weyl.math.psu.edu/
explained his reasoning:
- - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
- - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222240530.21702-100000@weyl.math.psu.edu/
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112230849550.23300-100000@weyl.math.psu.edu/
and, most importantly, designed and implemented the initramfs code.
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index b7f35b07876a..8c8ce678148a 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -17,17 +17,18 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
flag bits:
-=============================================== ================================
-RDT (Resource Director Technology) Allocation "rdt_a"
-CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
-CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
-CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
-MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
-MBA (Memory Bandwidth Allocation) "mba"
-SMBA (Slow Memory Bandwidth Allocation) ""
-BMEC (Bandwidth Monitoring Event Configuration) ""
-ABMC (Assignable Bandwidth Monitoring Counters) ""
-=============================================== ================================
+=============================================================== ================================
+RDT (Resource Director Technology) Allocation "rdt_a"
+CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
+CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
+CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
+MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
+MBA (Memory Bandwidth Allocation) "mba"
+SMBA (Slow Memory Bandwidth Allocation) ""
+BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
+SDCIAE (Smart Data Cache Injection Allocation Enforcement) ""
+=============================================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
resulted in the feature flags becoming hard to parse by humans. Adding a new
@@ -72,6 +73,11 @@ The 'info' directory contains information about the enabled
resources. Each resource has its own subdirectory. The subdirectory
names reflect the resource names.
+Most of the files in the resource's subdirectory are read-only, and
+describe properties of the resource. Resources that support global
+configuration options also include writable files that can be used
+to modify those settings.
+
Each subdirectory contains the following files with respect to
allocation:
@@ -90,12 +96,19 @@ related to allocation:
must be set when writing a mask.
"shareable_bits":
- Bitmask of shareable resource with other executing
- entities (e.g. I/O). User can use this when
- setting up exclusive cache partitions. Note that
- some platforms support devices that have their
- own settings for cache use which can over-ride
- these bits.
+ Bitmask of shareable resource with other executing entities
+ (e.g. I/O). Applies to all instances of this resource. User
+ can use this when setting up exclusive cache partitions.
+ Note that some platforms support devices that have their
+ own settings for cache use which can over-ride these bits.
+
+ When "io_alloc" is enabled, a portion of each cache instance can
+ be configured for shared use between hardware and software.
+ "bit_usage" should be used to see which portions of each cache
+ instance is configured for hardware use via "io_alloc" feature
+ because every cache instance can have its "io_alloc" bitmask
+ configured independently via "io_alloc_cbm".
+
"bit_usage":
Annotated capacity bitmasks showing how all
instances of the resource are used. The legend is:
@@ -109,16 +122,16 @@ related to allocation:
"H":
Corresponding region is used by hardware only
but available for software use. If a resource
- has bits set in "shareable_bits" but not all
- of these bits appear in the resource groups'
- schematas then the bits appearing in
- "shareable_bits" but no resource group will
- be marked as "H".
+ has bits set in "shareable_bits" or "io_alloc_cbm"
+ but not all of these bits appear in the resource
+ groups' schemata then the bits appearing in
+ "shareable_bits" or "io_alloc_cbm" but no
+ resource group will be marked as "H".
"X":
Corresponding region is available for sharing and
- used by hardware and software. These are the
- bits that appear in "shareable_bits" as
- well as a resource group's allocation.
+ used by hardware and software. These are the bits
+ that appear in "shareable_bits" or "io_alloc_cbm"
+ as well as a resource group's allocation.
"S":
Corresponding region is used by software
and available for sharing.
@@ -136,6 +149,77 @@ related to allocation:
"1":
Non-contiguous 1s value in CBM is supported.
+"io_alloc":
+ "io_alloc" enables system software to configure the portion of
+ the cache allocated for I/O traffic. File may only exist if the
+ system supports this feature on some of its cache resources.
+
+ "disabled":
+ Resource supports "io_alloc" but the feature is disabled.
+ Portions of cache used for allocation of I/O traffic cannot
+ be configured.
+ "enabled":
+ Portions of cache used for allocation of I/O traffic
+ can be configured using "io_alloc_cbm".
+ "not supported":
+ Support not available for this resource.
+
+ The feature can be modified by writing to the interface, for example:
+
+ To enable::
+
+ # echo 1 > /sys/fs/resctrl/info/L3/io_alloc
+
+ To disable::
+
+ # echo 0 > /sys/fs/resctrl/info/L3/io_alloc
+
+ The underlying implementation may reduce resources available to
+ general (CPU) cache allocation. See architecture specific notes
+ below. Depending on usage requirements the feature can be enabled
+ or disabled.
+
+ On AMD systems, io_alloc feature is supported by the L3 Smart
+ Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
+ io_alloc is the highest CLOSID supported by the resource. When
+ io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and
+ no longer available for general (CPU) cache allocation. When CDP is
+ enabled, io_alloc routes I/O traffic using the highest CLOSID allocated
+ for the instruction cache (CDP_CODE), making this CLOSID no longer
+ available for general (CPU) cache allocation for both the CDP_CODE
+ and CDP_DATA resources.
+
+"io_alloc_cbm":
+ Capacity bitmasks that describe the portions of cache instances to
+ which I/O traffic from supported I/O devices are routed when "io_alloc"
+ is enabled.
+
+ CBMs are displayed in the following format:
+
+ <cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+ Example::
+
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=ffff;1=ffff
+
+ CBMs can be configured by writing to the interface.
+
+ Example::
+
+ # echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=ffff;1=00ff
+
+ # echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=00ff;1=000f
+
+ When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE
+ resources may reflect the same values. For example, values read from and
+ written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by
+ /sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa.
+
Memory bandwidth(MB) subdirectory contains the following files
with respect to allocation:
diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 8cbcd3c26434..3d9233f403db 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -105,10 +105,8 @@ occur; this capability aids both strategies.
TLDR; Show Me the Code!
-----------------------
-Code is posted to the kernel.org git trees as follows:
-`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
-`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
-`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
+Kernel and userspace code has been fully merged as of October 2025.
+
Each kernel patchset adding an online repair function will use the same branch
name across the kernel, xfsprogs, and fstests git repos.
@@ -249,7 +247,7 @@ sharing and lock acquisition rules as the regular filesystem.
This means that scrub cannot take *any* shortcuts to save time, because doing
so could lead to concurrency problems.
In other words, online fsck is not a complete replacement for offline fsck, and
-a complete run of online fsck may take longer than online fsck.
+a complete run of online fsck may take longer than offline fsck.
However, both of these limitations are acceptable tradeoffs to satisfy the
different motivations of online fsck, which are to **minimize system downtime**
and to **increase predictability of operation**.
@@ -764,12 +762,8 @@ allow the online fsck developers to compare online fsck against offline fsck,
and they enable XFS developers to find deficiencies in the code base.
Proposed patchsets include
-`general fuzzer improvements
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_,
`fuzzing baselines
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_,
-and `improvements in fuzz testing comprehensiveness
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_.
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_.
Stress Testing
--------------
@@ -801,11 +795,6 @@ Success is defined by the ability to run all of these tests without observing
any unexpected filesystem shutdowns due to corrupted metadata, kernel hang
check warnings, or any other sort of mischief.
-Proposed patchsets include `general stress testing
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
-and the `evolution of existing per-function stress testing
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
-
4. User Interface
=================
@@ -886,10 +875,6 @@ apply as nice of a priority to IO and CPU scheduling as possible.
This measure was taken to minimize delays in the rest of the filesystem.
No such hardening has been performed for the cron job.
-Proposed patchset:
-`Enabling the xfs_scrub background service
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
-
Health Reporting
----------------
@@ -912,13 +897,6 @@ notifications and initiate a repair?
*Answer*: These questions remain unanswered, but should be a part of the
conversation with early adopters and potential downstream users of XFS.
-Proposed patchsets include
-`wiring up health reports to correction returns
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
-and
-`preservation of sickness info during memory reclaim
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.
-
5. Kernel Algorithms and Data Structures
========================================
@@ -1310,21 +1288,6 @@ Space allocation records are cross-referenced as follows:
are there the same number of reverse mapping records for each block as the
reference count record claims?
-Proposed patchsets are the series to find gaps in
-`refcount btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps>`_,
-`inode btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-inobt-gaps>`_, and
-`rmap btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-rmapbt-gaps>`_ records;
-to find
-`mergeable records
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-mergeable-records>`_;
-and to
-`improve cross referencing with rmap
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-strengthen-rmap-checking>`_
-before starting a repair.
-
Checking Extended Attributes
````````````````````````````
@@ -1756,10 +1719,6 @@ For scrub, the drain works as follows:
To avoid polling in step 4, the drain provides a waitqueue for scrub threads to
be woken up whenever the intent count drops to zero.
-The proposed patchset is the
-`scrub intent drain series
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-drain-intents>`_.
-
.. _jump_labels:
Static Keys (aka Jump Label Patching)
@@ -2036,10 +1995,6 @@ The ``xfarray_store_anywhere`` function is used to insert a record in any
null record slot in the bag; and the ``xfarray_unset`` function removes a
record from the bag.
-The proposed patchset is the
-`big in-memory array
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=big-array>`_.
-
Iterating Array Elements
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2172,10 +2127,6 @@ However, it should be noted that these repair functions only use blob storage
to cache a small number of entries before adding them to a temporary ondisk
file, which is why compaction is not required.
-The proposed patchset is at the start of the
-`extended attribute repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ series.
-
.. _xfbtree:
In-Memory B+Trees
@@ -2214,11 +2165,6 @@ xfiles enables reuse of the entire btree library.
Btrees built atop an xfile are collectively known as ``xfbtrees``.
The next few sections describe how they actually work.
-The proposed patchset is the
-`in-memory btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=in-memory-btrees>`_
-series.
-
Using xfiles as a Buffer Cache Target
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2459,14 +2405,6 @@ This enables the log to release the old EFI to keep the log moving forwards.
EFIs have a role to play during the commit and reaping phases; please see the
next section and the section about :ref:`reaping<reaping>` for more details.
-Proposed patchsets are the
-`bitmap rework
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-bitmap-rework>`_
-and the
-`preparation for bulk loading btrees
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_.
-
-
Writing the New Tree
````````````````````
@@ -2623,11 +2561,6 @@ The number of records for the inode btree is the number of xfarray records,
but the record count for the free inode btree has to be computed as inode chunk
records are stored in the xfarray.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding the Space Reference Counts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2716,11 +2649,6 @@ Reverse mappings are added to the bag using ``xfarray_store_anywhere`` and
removed via ``xfarray_unset``.
Bag members are examined through ``xfarray_iter`` loops.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding File Fork Mapping Indices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2757,11 +2685,6 @@ EXTENTS format instead of BMBT, which may require a conversion.
Third, the incore extent map must be reloaded carefully to avoid disturbing
any delayed allocation extents.
-The proposed patchset is the
-`file mapping repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-file-mappings>`_
-series.
-
.. _reaping:
Reaping Old Metadata Blocks
@@ -2843,11 +2766,6 @@ blocks.
As stated earlier, online repair functions use very large transactions to
minimize the chances of this occurring.
-The proposed patchset is the
-`preparation for bulk loading btrees
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_
-series.
-
Case Study: Reaping After a Regular Btree Repair
````````````````````````````````````````````````
@@ -2943,11 +2861,6 @@ When the walk is complete, the bitmap disunion operation ``(ag_owner_bitmap &
btrees.
These blocks can then be reaped using the methods outlined above.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
.. _rmap_reap:
Case Study: Reaping After Repairing Reverse Mapping Btrees
@@ -2972,11 +2885,6 @@ methods outlined above.
The rest of the process of rebuildng the reverse mapping btree is discussed
in a separate :ref:`case study<rmap_repair>`.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding the AGFL
```````````````````````````````
@@ -3024,11 +2932,6 @@ more complicated, because computing the correct value requires traversing the
forks, or if that fails, leaving the fields invalid and waiting for the fork
fsck functions to run.
-The proposed patchset is the
-`inode
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-inodes>`_
-repair series.
-
Quota Record Repairs
--------------------
@@ -3045,11 +2948,6 @@ checking are obviously bad limits and timer values.
Quota usage counters are checked, repaired, and discussed separately in the
section about :ref:`live quotacheck <quotacheck>`.
-The proposed patchset is the
-`quota
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quota>`_
-repair series.
-
.. _fscounters:
Freezing to Fix Summary Counters
@@ -3145,11 +3043,6 @@ long enough to check and correct the summary counters.
| This bug was fixed in Linux 5.17. |
+--------------------------------------------------------------------------+
-The proposed patchset is the
-`summary counter cleanup
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters>`_
-series.
-
Full Filesystem Scans
---------------------
@@ -3277,15 +3170,6 @@ Second, if the incore inode is stuck in some intermediate state, the scan
coordinator must release the AGI and push the main filesystem to get the inode
back into a loadable state.
-The proposed patches are the
-`inode scanner
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
-series.
-The first user of the new functionality is the
-`online quotacheck
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
-series.
-
Inode Management
````````````````
@@ -3381,12 +3265,6 @@ To capture these nuances, the online fsck code has a separate ``xchk_irele``
function to set or clear the ``DONTCACHE`` flag to get the required release
behavior.
-Proposed patchsets include fixing
-`scrub iget usage
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes>`_ and
-`dir iget usage
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-dir-iget-fixes>`_.
-
.. _ilocking:
Locking Inodes
@@ -3443,11 +3321,6 @@ If the dotdot entry changes while the directory is unlocked, then a move or
rename operation must have changed the child's parentage, and the scan can
exit early.
-The proposed patchset is the
-`directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
-series.
-
.. _fshooks:
Filesystem Hooks
@@ -3594,11 +3467,6 @@ The inode scan APIs are pretty simple:
- ``xchk_iscan_teardown`` to finish the scan
-This functionality is also a part of the
-`inode scanner
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
-series.
-
.. _quotacheck:
Case Study: Quota Counter Checking
@@ -3686,11 +3554,6 @@ needing to hold any locks for a long duration.
If repairs are desired, the real and shadow dquots are locked and their
resource counts are set to the values in the shadow dquot.
-The proposed patchset is the
-`online quotacheck
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
-series.
-
.. _nlinks:
Case Study: File Link Count Checking
@@ -3744,11 +3607,6 @@ shadow information.
If no parents are found, the file must be :ref:`reparented <orphanage>` to the
orphanage to prevent the file from being lost forever.
-The proposed patchset is the
-`file link count repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-nlinks>`_
-series.
-
.. _rmap_repair:
Case Study: Rebuilding Reverse Mapping Records
@@ -3828,11 +3686,6 @@ scan for reverse mapping records.
12. Free the xfbtree now that it not needed.
-The proposed patchset is the
-`rmap repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rmap-btree>`_
-series.
-
Staging Repairs with Temporary Files on Disk
--------------------------------------------
@@ -3971,11 +3824,6 @@ Once a good copy of a data file has been constructed in a temporary file, it
must be conveyed to the file being repaired, which is the topic of the next
section.
-The proposed patches are in the
-`repair temporary files
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
-series.
-
Logged File Content Exchanges
-----------------------------
@@ -4025,11 +3873,6 @@ The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag
in the superblock protects these new log item records from being replayed on
old kernels.
-The proposed patchset is the
-`file contents exchange
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
-series.
-
+--------------------------------------------------------------------------+
| **Sidebar: Using Log-Incompatible Feature Flags** |
+--------------------------------------------------------------------------+
@@ -4323,11 +4166,6 @@ To repair the summary file, write the xfile contents into the temporary file
and use atomic mapping exchange to commit the new contents.
The temporary file is then reaped.
-The proposed patchset is the
-`realtime summary repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rtsummary>`_
-series.
-
Case Study: Salvaging Extended Attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4369,11 +4207,6 @@ Salvaging extended attributes is done as follows:
4. Reap the temporary file.
-The proposed patchset is the
-`extended attribute repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_
-series.
-
Fixing Directories
------------------
@@ -4448,11 +4281,6 @@ Unfortunately, the current dentry cache design doesn't provide a means to walk
every child dentry of a specific directory, which makes this a hard problem.
There is no known solution.
-The proposed patchset is the
-`directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
-series.
-
Parent Pointers
```````````````
@@ -4612,11 +4440,6 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
7. Reap the temporary directory.
-The proposed patchset is the
-`parent pointers directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
-series.
-
Case Study: Repairing Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4662,11 +4485,6 @@ directory reconstruction:
8. Reap the temporary file.
-The proposed patchset is the
-`parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
-series.
-
Digression: Offline Checking of Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4755,11 +4573,6 @@ connectivity checks:
4. Move on to examining link counts, as we do today.
-The proposed patchset is the
-`offline parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_
-series.
-
Rebuilding directories from parent pointers in offline repair would be very
challenging because xfs_repair currently uses two single-pass scans of the
filesystem during phases 3 and 4 to decide which files are corrupt enough to be
@@ -4903,12 +4716,6 @@ Repairing the directory tree works as follows:
6. If the subdirectory has zero paths, attach it to the lost and found.
-The proposed patches are in the
-`directory tree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_
-series.
-
-
.. _orphanage:
The Orphanage
@@ -4973,11 +4780,6 @@ Orphaned files are adopted by the orphanage as follows:
7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all
resources.
-The proposed patches are in the
-`orphanage adoption
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-orphanage>`_
-series.
-
6. Userspace Algorithms and Data Structures
===========================================
@@ -5091,14 +4893,6 @@ first workqueue's workers until the backlog eases.
This doesn't completely solve the balancing problem, but reduces it enough to
move on to more pressing issues.
-The proposed patchsets are the scrub
-`performance tweaks
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-performance-tweaks>`_
-and the
-`inode scan rebalance
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-iscan-rebalance>`_
-series.
-
.. _scrubrepair:
Scheduling Repairs
@@ -5179,20 +4973,6 @@ immediately.
Corrupt file data blocks reported by phase 6 cannot be recovered by the
filesystem.
-The proposed patchsets are the
-`repair warning improvements
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-better-repair-warnings>`_,
-refactoring of the
-`repair data dependency
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-data-deps>`_
-and
-`object tracking
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-object-tracking>`_,
-and the
-`repair scheduling
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-scheduling>`_
-improvement series.
-
Checking Names for Confusable Unicode Sequences
-----------------------------------------------
@@ -5372,6 +5152,8 @@ The extra flexibility enables several new use cases:
This emulates an atomic device write in software, and can support arbitrary
scattered writes.
+(This functionality was merged into mainline as of 2025)
+
Vectorized Scrub
----------------
@@ -5393,13 +5175,7 @@ It is hoped that ``io_uring`` will pick up enough of this functionality that
online fsck can use that instead of adding a separate vectored scrub system
call to XFS.
-The relevant patchsets are the
-`kernel vectorized scrub
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub>`_
-and
-`userspace vectorized scrub
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub>`_
-series.
+(This functionality was merged into mainline as of 2025)
Quality of Service Targets for Scrub
------------------------------------
diff --git a/Documentation/input/event-codes.rst b/Documentation/input/event-codes.rst
index 1ead9bb8d9c6..4424cbff251f 100644
--- a/Documentation/input/event-codes.rst
+++ b/Documentation/input/event-codes.rst
@@ -400,19 +400,30 @@ can report through the rotational axes (absolute and/or relative rx, ry, rz).
All other axes retain their meaning. A device must not mix
regular directional axes and accelerometer axes on the same event node.
-INPUT_PROP_HAPTIC_TOUCHPAD
---------------------------
+INPUT_PROP_PRESSUREPAD
+----------------------
+
+The INPUT_PROP_PRESSUREPAD property indicates that the device provides
+simulated haptic feedback (e.g. a vibrator motor situated below the surface)
+instead of physical haptic feedback (e.g. a hinge). This property is only set
+if the device:
-The INPUT_PROP_HAPTIC_TOUCHPAD property indicates that device:
-- supports simple haptic auto and manual triggering
- can differentiate between at least 5 fingers
- uses correct resolution for the X/Y (units and value)
-- reports correct force per touch, and correct units for them (newtons or grams)
- follows the MT protocol type B
+If the simulated haptic feedback is controllable by userspace the device must:
+
+- support simple haptic auto and manual triggering, and
+- report correct force per touch, and correct units for them (newtons or grams), and
+- provide the EV_FF FF_HAPTIC force feedback effect.
+
Summing up, such devices follow the MS spec for input devices in
-Win8 and Win8.1, and in addition support the Simple haptic controller HID table,
-and report correct units for the pressure.
+Win8 and Win8.1, and in addition may support the Simple haptic controller HID
+table, and report correct units for the pressure.
+
+Where applicable, this property is set in addition to INPUT_PROP_BUTTONPAD, it
+does not replace that property.
Guidelines
==========
diff --git a/Documentation/kbuild/kbuild.rst b/Documentation/kbuild/kbuild.rst
index 3388a10f2dcc..82826b0332df 100644
--- a/Documentation/kbuild/kbuild.rst
+++ b/Documentation/kbuild/kbuild.rst
@@ -328,8 +328,14 @@ KBUILD_BUILD_TIMESTAMP
----------------------
Setting this to a date string overrides the timestamp used in the
UTS_VERSION definition (uname -v in the running kernel). The value has to
-be a string that can be passed to date -d. The default value
-is the output of the date command at one point during build.
+be a string that can be passed to date -d. E.g.::
+
+ $ KBUILD_BUILD_TIMESTAMP="Mon Oct 13 00:00:00 UTC 2025" make
+
+The default value is the output of the date command at one point during
+build. If provided, this timestamp will also be used for mtime fields
+within any initramfs archive. Initramfs mtimes are 32-bit, so dates before
+the 1970 Unix epoch, or after 2106-02-07 06:28:15 UTC will fail.
KBUILD_BUILD_USER, KBUILD_BUILD_HOST
------------------------------------
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index 3fb7ea3ab22a..9899871d3d9a 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -220,13 +220,14 @@ Read path, three categories:
according to a passed marker. This is used to avoid lockless readers
starvation (too much retry loops) in case of a sharp spike in write
activity. First, a lockless read is tried (even marker passed). If
- that trial fails (odd sequence counter is returned, which is used as
- the next iteration marker), the lockless read is transformed to a
- full locking read and no retry loop is necessary::
+ that trial fails (sequence counter doesn't match), make the marker
+ odd for the next iteration, the lockless read is transformed to a
+ full locking read and no retry loop is necessary, for example::
/* marker; even initialization */
- int seq = 0;
+ int seq = 1;
do {
+ seq++; /* 2 on the 1st/lockless path, otherwise odd */
read_seqbegin_or_lock(&foo_seqlock, &seq);
/* ... [[read-side critical section]] ... */
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1d164e005776..61b7317bcf2e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2182,9 +2182,11 @@ set_current_state() may be wrapped by:
which therefore also imply a general memory barrier after setting the state.
The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+interpolate the memory barrier in the right place, for example:
wait_event();
+ wait_event_cmd();
+ wait_event_exclusive_cmd();
wait_event_interruptible();
wait_event_interruptible_exclusive();
wait_event_interruptible_timeout();
@@ -2192,8 +2194,6 @@ interpolate the memory barrier in the right place:
wait_event_timeout();
wait_on_bit();
wait_on_bit_lock();
- wait_event_cmd();
- wait_event_exclusive_cmd();
Secondly, code that performs a wake up normally follows something like this:
diff --git a/Documentation/misc-devices/amd-sbi.rst b/Documentation/misc-devices/amd-sbi.rst
index 5648fc6ec572..07ceb44fbe5e 100644
--- a/Documentation/misc-devices/amd-sbi.rst
+++ b/Documentation/misc-devices/amd-sbi.rst
@@ -28,8 +28,10 @@ MCAMSR and register xfer commands.
Register sets is common across APML protocols. IOCTL is providing synchronization
among protocols as transactions may create race condition.
-$ ls -al /dev/sbrmi-3c
-crw------- 1 root root 10, 53 Jul 10 11:13 /dev/sbrmi-3c
+.. code-block:: bash
+
+ $ ls -al /dev/sbrmi-3c
+ crw------- 1 root root 10, 53 Jul 10 11:13 /dev/sbrmi-3c
apml_sbrmi driver registers hwmon sensors for monitoring power_cap_max,
current power consumption and managing power_cap.
diff --git a/Documentation/misc-devices/mrvl_cn10k_dpi.rst b/Documentation/misc-devices/mrvl_cn10k_dpi.rst
index a75e372723d8..fa9b8cd6806f 100644
--- a/Documentation/misc-devices/mrvl_cn10k_dpi.rst
+++ b/Documentation/misc-devices/mrvl_cn10k_dpi.rst
@@ -33,12 +33,12 @@ drivers/misc/mrvl_cn10k_dpi.c
Driver IOCTLs
=============
-:c:macro::`DPI_MPS_MRRS_CFG`
+:c:macro:`DPI_MPS_MRRS_CFG`
ioctl that sets max payload size & max read request size parameters of
a pem port to which DMA engines are wired.
-:c:macro::`DPI_ENGINE_CFG`
+:c:macro:`DPI_ENGINE_CFG`
ioctl that sets DMA engine's fifo sizes & max outstanding load request
thresholds.
diff --git a/Documentation/misc-devices/tps6594-pfsm.rst b/Documentation/misc-devices/tps6594-pfsm.rst
index 4ada37ccdcba..5f17a4fd9579 100644
--- a/Documentation/misc-devices/tps6594-pfsm.rst
+++ b/Documentation/misc-devices/tps6594-pfsm.rst
@@ -39,28 +39,28 @@ include/uapi/linux/tps6594_pfsm.h
Driver IOCTLs
=============
-:c:macro::`PMIC_GOTO_STANDBY`
+:c:macro:`PMIC_GOTO_STANDBY`
All device resources are powered down. The processor is off, and
no voltage domains are energized.
-:c:macro::`PMIC_GOTO_LP_STANDBY`
+:c:macro:`PMIC_GOTO_LP_STANDBY`
The digital and analog functions of the PMIC, which are not
required to be always-on, are turned off (low-power).
-:c:macro::`PMIC_UPDATE_PGM`
+:c:macro:`PMIC_UPDATE_PGM`
Triggers a firmware update.
-:c:macro::`PMIC_SET_ACTIVE_STATE`
+:c:macro:`PMIC_SET_ACTIVE_STATE`
One of the operational modes.
The PMICs are fully functional and supply power to all PDN loads.
All voltage domains are energized in both MCU and Main processor
sections.
-:c:macro::`PMIC_SET_MCU_ONLY_STATE`
+:c:macro:`PMIC_SET_MCU_ONLY_STATE`
One of the operational modes.
Only the power resources assigned to the MCU Safety Island are on.
-:c:macro::`PMIC_SET_RETENTION_STATE`
+:c:macro:`PMIC_SET_RETENTION_STATE`
One of the operational modes.
Depending on the triggers set, some DDR/GPIO voltage domains can
remain energized, while all other domains are off to minimize
diff --git a/Documentation/misc-devices/uacce.rst b/Documentation/misc-devices/uacce.rst
index 1db412e9b1a3..5f78d413e379 100644
--- a/Documentation/misc-devices/uacce.rst
+++ b/Documentation/misc-devices/uacce.rst
@@ -1,7 +1,10 @@
.. SPDX-License-Identifier: GPL-2.0
-Introduction of Uacce
----------------------
+Uacce (Unified/User-space-access-intended Accelerator Framework)
+================================================================
+
+Introduction
+------------
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
diff --git a/Documentation/mm/active_mm.rst b/Documentation/mm/active_mm.rst
index d096fc091e23..60d819d7d043 100644
--- a/Documentation/mm/active_mm.rst
+++ b/Documentation/mm/active_mm.rst
@@ -92,4 +92,4 @@ helpers, which abstract this config option.
and register state is separate, the alpha PALcode joins the two, and you
need to switch both together).
- (From http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
+ (From https://lore.kernel.org/lkml/Pine.LNX.4.10.9907301410280.752-100000@penguin.transmeta.com/)
diff --git a/Documentation/netlink/genetlink-c.yaml b/Documentation/netlink/genetlink-c.yaml
index 5a234e9b5fa2..57f59fe23e3f 100644
--- a/Documentation/netlink/genetlink-c.yaml
+++ b/Documentation/netlink/genetlink-c.yaml
@@ -227,7 +227,7 @@ properties:
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
- enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
+ enum: [ hex, mac, fddi, ipv4, ipv6, ipv4-or-v6, uuid ]
# Start genetlink-c
name-prefix:
type: string
diff --git a/Documentation/netlink/genetlink.yaml b/Documentation/netlink/genetlink.yaml
index 7b1ec153e834..b020a537d8ac 100644
--- a/Documentation/netlink/genetlink.yaml
+++ b/Documentation/netlink/genetlink.yaml
@@ -185,7 +185,7 @@ properties:
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
- enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
+ enum: [ hex, mac, fddi, ipv4, ipv6, ipv4-or-v6, uuid ]
# Make sure name-prefix does not appear in subsets (subsets inherit naming)
dependencies:
diff --git a/Documentation/netlink/netlink-raw.yaml b/Documentation/netlink/netlink-raw.yaml
index 246fa07bccf6..0166a7e4afbb 100644
--- a/Documentation/netlink/netlink-raw.yaml
+++ b/Documentation/netlink/netlink-raw.yaml
@@ -157,7 +157,7 @@ properties:
Optional format indicator that is intended only for choosing
the right formatting mechanism when displaying values of this
type.
- enum: [ hex, mac, fddi, ipv4, ipv6, uuid ]
+ enum: [ hex, mac, fddi, ipv4, ipv6, ipv4-or-v6, uuid ]
struct:
description: Name of the nested struct type.
type: string
diff --git a/Documentation/netlink/specs/conntrack.yaml b/Documentation/netlink/specs/conntrack.yaml
index bef528633b17..db7cddcda50a 100644
--- a/Documentation/netlink/specs/conntrack.yaml
+++ b/Documentation/netlink/specs/conntrack.yaml
@@ -457,7 +457,7 @@ attribute-sets:
name: labels
type: binary
-
- name: labels mask
+ name: labels-mask
type: binary
-
name: synproxy
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 3db59c965869..837112da6738 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -99,6 +99,8 @@ definitions:
name: legacy
-
name: switchdev
+ -
+ name: switchdev-inactive
-
type: enum
name: eswitch-inline-mode
@@ -857,6 +859,14 @@ attribute-sets:
name: health-reporter-burst-period
type: u64
doc: Time (in msec) for recoveries before starting the grace period.
+
+ # TODO: fill in the attributes in between
+
+ -
+ name: param-reset-default
+ type: flag
+ doc: Request restoring parameter to its default value.
+ value: 183
-
name: dl-dev-stats
subset-of: devlink
@@ -1791,6 +1801,7 @@ operations:
- param-type
# param-value-data is missing here as the type is variable
- param-value-cmode
+ - param-reset-default
-
name: region-get
diff --git a/Documentation/netlink/specs/dpll.yaml b/Documentation/netlink/specs/dpll.yaml
index 80728f6f9bc8..78d0724d7e12 100644
--- a/Documentation/netlink/specs/dpll.yaml
+++ b/Documentation/netlink/specs/dpll.yaml
@@ -440,6 +440,12 @@ attribute-sets:
doc: |
Capable pin provides list of pins that can be bound to create a
reference-sync pin pair.
+ -
+ name: phase-adjust-gran
+ type: u32
+ doc: |
+ Granularity of phase adjustment, in picoseconds. The value of
+ phase adjustment must be a multiple of this granularity.
-
name: pin-parent-device
@@ -616,6 +622,7 @@ operations:
- capabilities
- parent-device
- parent-pin
+ - phase-adjust-gran
- phase-adjust-min
- phase-adjust-max
- phase-adjust
diff --git a/Documentation/netlink/specs/em.yaml b/Documentation/netlink/specs/em.yaml
new file mode 100644
index 000000000000..9905ca482325
--- /dev/null
+++ b/Documentation/netlink/specs/em.yaml
@@ -0,0 +1,113 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+name: em
+
+doc: |
+ Energy model netlink interface to notify its changes.
+
+protocol: genetlink
+
+uapi-header: linux/energy_model.h
+
+attribute-sets:
+ -
+ name: pds
+ attributes:
+ -
+ name: pd
+ type: nest
+ nested-attributes: pd
+ multi-attr: true
+ -
+ name: pd
+ attributes:
+ -
+ name: pad
+ type: pad
+ -
+ name: pd-id
+ type: u32
+ -
+ name: flags
+ type: u64
+ -
+ name: cpus
+ type: string
+ -
+ name: pd-table
+ attributes:
+ -
+ name: pd-id
+ type: u32
+ -
+ name: ps
+ type: nest
+ nested-attributes: ps
+ multi-attr: true
+ -
+ name: ps
+ attributes:
+ -
+ name: pad
+ type: pad
+ -
+ name: performance
+ type: u64
+ -
+ name: frequency
+ type: u64
+ -
+ name: power
+ type: u64
+ -
+ name: cost
+ type: u64
+ -
+ name: flags
+ type: u64
+
+operations:
+ list:
+ -
+ name: get-pds
+ attribute-set: pds
+ doc: Get the list of information for all performance domains.
+ do:
+ reply:
+ attributes:
+ - pd
+ -
+ name: get-pd-table
+ attribute-set: pd-table
+ doc: Get the energy model table of a performance domain.
+ do:
+ request:
+ attributes:
+ - pd-id
+ reply:
+ attributes:
+ - pd-id
+ - ps
+ -
+ name: pd-created
+ doc: A performance domain is created.
+ notify: get-pd-table
+ mcgrp: event
+ -
+ name: pd-updated
+ doc: A performance domain is updated.
+ notify: get-pd-table
+ mcgrp: event
+ -
+ name: pd-deleted
+ doc: A performance domain is deleted.
+ attribute-set: pd-table
+ event:
+ attributes:
+ - pd-id
+ mcgrp: event
+
+mcast-groups:
+ list:
+ -
+ name: event
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 6a0fb1974513..0a2d2343f79a 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -1269,7 +1269,7 @@ attribute-sets:
-
name: hist
type: nest
- multi-attr: True
+ multi-attr: true
nested-attributes: fec-hist
-
name: fec
@@ -1823,6 +1823,73 @@ attribute-sets:
type: uint
enum: pse-event
doc: List of events reported by the PSE controller
+ -
+ name: mse-capabilities
+ doc: MSE capabilities attribute set
+ attr-cnt-name: --ethtool-a-mse-capabilities-cnt
+ attributes:
+ -
+ name: max-average-mse
+ type: uint
+ -
+ name: max-peak-mse
+ type: uint
+ -
+ name: refresh-rate-ps
+ type: uint
+ -
+ name: num-symbols
+ type: uint
+ -
+ name: mse-snapshot
+ doc: MSE snapshot attribute set
+ attr-cnt-name: --ethtool-a-mse-snapshot-cnt
+ attributes:
+ -
+ name: average-mse
+ type: uint
+ -
+ name: peak-mse
+ type: uint
+ -
+ name: worst-peak-mse
+ type: uint
+ -
+ name: mse
+ attr-cnt-name: --ethtool-a-mse-cnt
+ attributes:
+ -
+ name: header
+ type: nest
+ nested-attributes: header
+ -
+ name: capabilities
+ type: nest
+ nested-attributes: mse-capabilities
+ -
+ name: channel-a
+ type: nest
+ nested-attributes: mse-snapshot
+ -
+ name: channel-b
+ type: nest
+ nested-attributes: mse-snapshot
+ -
+ name: channel-c
+ type: nest
+ nested-attributes: mse-snapshot
+ -
+ name: channel-d
+ type: nest
+ nested-attributes: mse-snapshot
+ -
+ name: worst-channel
+ type: nest
+ nested-attributes: mse-snapshot
+ -
+ name: link
+ type: nest
+ nested-attributes: mse-snapshot
operations:
enum-model: directional
@@ -2756,6 +2823,25 @@ operations:
attributes:
- header
- context
+ -
+ name: mse-get
+ doc: Get PHY MSE measurement data and capabilities.
+ attribute-set: mse
+ do: &mse-get-op
+ request:
+ attributes:
+ - header
+ reply:
+ attributes:
+ - header
+ - capabilities
+ - channel-a
+ - channel-b
+ - channel-c
+ - channel-d
+ - worst-channel
+ - link
+ dump: *mse-get-op
mcast-groups:
list:
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index e00d3fa1c152..82bf5cb2617d 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -88,7 +88,7 @@ definitions:
-
name: napi-threaded
type: enum
- entries: [disabled, enabled]
+ entries: [disabled, enabled, busy-poll]
attribute-sets:
-
@@ -291,7 +291,8 @@ attribute-sets:
name: threaded
doc: Whether the NAPI is configured to operate in threaded polling
mode. If this is set to enabled then the NAPI context operates
- in threaded polling mode.
+ in threaded polling mode. If this is set to busy-poll, then the
+ threaded polling mode also busy polls.
type: u32
enum: napi-threaded
-
@@ -732,6 +733,29 @@ operations:
- rx-bytes
- tx-packets
- tx-bytes
+ - rx-alloc-fail
+ - rx-hw-drops
+ - rx-hw-drop-overruns
+ - rx-csum-complete
+ - rx-csum-unnecessary
+ - rx-csum-none
+ - rx-csum-bad
+ - rx-hw-gro-packets
+ - rx-hw-gro-bytes
+ - rx-hw-gro-wire-packets
+ - rx-hw-gro-wire-bytes
+ - rx-hw-drop-ratelimits
+ - tx-hw-drops
+ - tx-hw-drop-errors
+ - tx-csum-none
+ - tx-needs-csum
+ - tx-hw-gso-packets
+ - tx-hw-gso-bytes
+ - tx-hw-gso-wire-packets
+ - tx-hw-gso-wire-bytes
+ - tx-hw-drop-ratelimits
+ - tx-stop
+ - tx-wake
-
name: bind-rx
doc: Bind dmabuf to netdev
diff --git a/Documentation/netlink/specs/nftables.yaml b/Documentation/netlink/specs/nftables.yaml
index cce88819ba71..17ad707fa0d5 100644
--- a/Documentation/netlink/specs/nftables.yaml
+++ b/Documentation/netlink/specs/nftables.yaml
@@ -915,7 +915,7 @@ attribute-sets:
type: string
doc: Name of set to use
-
- name: set id
+ name: set-id
type: u32
byte-order: big-endian
doc: ID of set to use
diff --git a/Documentation/netlink/specs/psp.yaml b/Documentation/netlink/specs/psp.yaml
index 944429e5c9a8..f3a57782d2cf 100644
--- a/Documentation/netlink/specs/psp.yaml
+++ b/Documentation/netlink/specs/psp.yaml
@@ -76,6 +76,83 @@ attribute-sets:
name: spi
doc: Security Parameters Index (SPI) of the association.
type: u32
+ -
+ name: stats
+ attributes:
+ -
+ name: dev-id
+ doc: PSP device ID.
+ type: u32
+ checks:
+ min: 1
+ -
+ name: key-rotations
+ type: uint
+ doc: |
+ Number of key rotations during the lifetime of the device.
+ Kernel statistic.
+ -
+ name: stale-events
+ type: uint
+ doc: |
+ Number of times a socket's Rx got shut down due to using
+ a key which went stale (fully rotated out).
+ Kernel statistic.
+ -
+ name: rx-packets
+ type: uint
+ doc: |
+ Number of successfully processed and authenticated PSP packets.
+ Device statistic (from the PSP spec).
+ -
+ name: rx-bytes
+ type: uint
+ doc: |
+ Number of successfully authenticated PSP bytes received, counting from
+ the first byte after the IV through the last byte of payload.
+ The fixed initial portion of the PSP header (16 bytes)
+ and the PSP trailer/ICV (16 bytes) are not included in this count.
+ Device statistic (from the PSP spec).
+ -
+ name: rx-auth-fail
+ type: uint
+ doc: |
+ Number of received PSP packets with unsuccessful authentication.
+ Device statistic (from the PSP spec).
+ -
+ name: rx-error
+ type: uint
+ doc: |
+ Number of received PSP packets with length/framing errors.
+ Device statistic (from the PSP spec).
+ -
+ name: rx-bad
+ type: uint
+ doc: |
+ Number of received PSP packets with miscellaneous errors
+ (invalid master key indicated by SPI, unsupported version, etc.)
+ Device statistic (from the PSP spec).
+ -
+ name: tx-packets
+ type: uint
+ doc: |
+ Number of successfully processed PSP packets for transmission.
+ Device statistic (from the PSP spec).
+ -
+ name: tx-bytes
+ type: uint
+ doc: |
+ Number of successfully processed PSP bytes for transmit, counting from
+ the first byte after the IV through the last byte of payload.
+ The fixed initial portion of the PSP header (16 bytes)
+ and the PSP trailer/ICV (16 bytes) are not included in this count.
+ Device statistic (from the PSP spec).
+ -
+ name: tx-error
+ type: uint
+ doc: |
+ Number of PSP packets for transmission with errors.
+ Device statistic (from the PSP spec).
operations:
list:
@@ -177,6 +254,24 @@ operations:
pre: psp-assoc-device-get-locked
post: psp-device-unlock
+ -
+ name: get-stats
+ doc: Get device statistics.
+ attribute-set: stats
+ do:
+ request:
+ attributes:
+ - dev-id
+ reply: &stats-all
+ attributes:
+ - dev-id
+ - key-rotations
+ - stale-events
+ pre: psp-device-get-locked
+ post: psp-device-unlock
+ dump:
+ reply: *stats-all
+
mcast-groups:
list:
-
diff --git a/Documentation/netlink/specs/rt-addr.yaml b/Documentation/netlink/specs/rt-addr.yaml
index 3a582eac1629..163a106c41bb 100644
--- a/Documentation/netlink/specs/rt-addr.yaml
+++ b/Documentation/netlink/specs/rt-addr.yaml
@@ -86,17 +86,18 @@ attribute-sets:
-
name: address
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: local
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: label
type: string
-
name: broadcast
- type: binary
+ type: u32
+ byte-order: big-endian
display-hint: ipv4
-
name: anycast
diff --git a/Documentation/netlink/specs/rt-link.yaml b/Documentation/netlink/specs/rt-link.yaml
index 2a23e9699c0b..6beeb6ee5adf 100644
--- a/Documentation/netlink/specs/rt-link.yaml
+++ b/Documentation/netlink/specs/rt-link.yaml
@@ -1707,11 +1707,11 @@ attribute-sets:
-
name: local
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: remote
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: ttl
type: u8
@@ -1833,11 +1833,11 @@ attribute-sets:
-
name: local
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: remote
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: fwmark
type: u32
@@ -1868,7 +1868,8 @@ attribute-sets:
type: u32
-
name: remote
- type: binary
+ type: u32
+ byte-order: big-endian
display-hint: ipv4
-
name: ttl
@@ -1914,6 +1915,35 @@ attribute-sets:
type: binary
struct: ifla-geneve-port-range
-
+ name: linkinfo-hsr-attrs
+ name-prefix: ifla-hsr-
+ attributes:
+ -
+ name: slave1
+ type: u32
+ -
+ name: slave2
+ type: u32
+ -
+ name: multicast-spec
+ type: u8
+ -
+ name: supervision-addr
+ type: binary
+ display-hint: mac
+ -
+ name: seq-nr
+ type: u16
+ -
+ name: version
+ type: u8
+ -
+ name: protocol
+ type: u8
+ -
+ name: interlink
+ type: u32
+ -
name: linkinfo-iptun-attrs
name-prefix: ifla-iptun-
attributes:
@@ -1923,11 +1953,11 @@ attribute-sets:
-
name: local
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: remote
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: ttl
type: u8
@@ -1957,7 +1987,8 @@ attribute-sets:
display-hint: ipv6
-
name: 6rd-relay-prefix
- type: binary
+ type: u32
+ byte-order: big-endian
display-hint: ipv4
-
name: 6rd-prefixlen
@@ -2300,6 +2331,9 @@ sub-messages:
value: geneve
attribute-set: linkinfo-geneve-attrs
-
+ value: hsr
+ attribute-set: linkinfo-hsr-attrs
+ -
value: ipip
attribute-set: linkinfo-iptun-attrs
-
diff --git a/Documentation/netlink/specs/rt-neigh.yaml b/Documentation/netlink/specs/rt-neigh.yaml
index 2f568a6231c9..0f46ef313590 100644
--- a/Documentation/netlink/specs/rt-neigh.yaml
+++ b/Documentation/netlink/specs/rt-neigh.yaml
@@ -194,7 +194,7 @@ attribute-sets:
-
name: dst
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: lladdr
type: binary
diff --git a/Documentation/netlink/specs/rt-route.yaml b/Documentation/netlink/specs/rt-route.yaml
index 1ecb3fadc067..33195db96746 100644
--- a/Documentation/netlink/specs/rt-route.yaml
+++ b/Documentation/netlink/specs/rt-route.yaml
@@ -87,11 +87,11 @@ attribute-sets:
-
name: dst
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: src
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: iif
type: u32
@@ -101,14 +101,14 @@ attribute-sets:
-
name: gateway
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: priority
type: u32
-
name: prefsrc
type: binary
- display-hint: ipv4
+ display-hint: ipv4-or-v6
-
name: metrics
type: nest
diff --git a/Documentation/netlink/specs/rt-rule.yaml b/Documentation/netlink/specs/rt-rule.yaml
index bebee452a950..7f03a44ab036 100644
--- a/Documentation/netlink/specs/rt-rule.yaml
+++ b/Documentation/netlink/specs/rt-rule.yaml
@@ -96,10 +96,12 @@ attribute-sets:
attributes:
-
name: dst
- type: u32
+ type: binary
+ display-hint: ipv4-or-v6
-
name: src
- type: u32
+ type: binary
+ display-hint: ipv4-or-v6
-
name: iifname
type: string
diff --git a/Documentation/netlink/specs/wireguard.yaml b/Documentation/netlink/specs/wireguard.yaml
new file mode 100644
index 000000000000..30479fc6bb69
--- /dev/null
+++ b/Documentation/netlink/specs/wireguard.yaml
@@ -0,0 +1,298 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+---
+name: wireguard
+protocol: genetlink-legacy
+
+doc: |
+ **Netlink protocol to control WireGuard network devices.**
+
+ The below enums and macros are for interfacing with WireGuard, using generic
+ netlink, with family ``WG_GENL_NAME`` and version ``WG_GENL_VERSION``. It
+ defines two commands: get and set. Note that while they share many common
+ attributes, these two commands actually accept a slightly different set of
+ inputs and outputs. These differences are noted under the individual
+ attributes.
+c-family-name: wg-genl-name
+c-version-name: wg-genl-version
+max-by-define: true
+
+definitions:
+ -
+ name-prefix: wg-
+ name: key-len
+ type: const
+ value: 32
+ -
+ name: --kernel-timespec
+ type: struct
+ header: linux/time_types.h
+ members:
+ -
+ name: sec
+ type: u64
+ doc: Number of seconds, since UNIX epoch.
+ -
+ name: nsec
+ type: u64
+ doc: Number of nanoseconds, after the second began.
+ -
+ name: wgdevice-flags
+ name-prefix: wgdevice-f-
+ enum-name: wgdevice-flag
+ type: flags
+ entries:
+ - replace-peers
+ -
+ name: wgpeer-flags
+ name-prefix: wgpeer-f-
+ enum-name: wgpeer-flag
+ type: flags
+ entries:
+ - remove-me
+ - replace-allowedips
+ - update-only
+ -
+ name: wgallowedip-flags
+ name-prefix: wgallowedip-f-
+ enum-name: wgallowedip-flag
+ type: flags
+ entries:
+ - remove-me
+
+attribute-sets:
+ -
+ name: wgdevice
+ enum-name: wgdevice-attribute
+ name-prefix: wgdevice-a-
+ attr-cnt-name: --wgdevice-a-last
+ attributes:
+ -
+ name: unspec
+ type: unused
+ value: 0
+ -
+ name: ifindex
+ type: u32
+ -
+ name: ifname
+ type: string
+ checks:
+ max-len: 15
+ -
+ name: private-key
+ type: binary
+ doc: Set to all zeros to remove.
+ display-hint: hex
+ checks:
+ exact-len: wg-key-len
+ -
+ name: public-key
+ type: binary
+ display-hint: hex
+ checks:
+ exact-len: wg-key-len
+ -
+ name: flags
+ type: u32
+ doc: |
+ ``0`` or ``WGDEVICE_F_REPLACE_PEERS`` if all current peers should be
+ removed prior to adding the list below.
+ enum: wgdevice-flags
+ -
+ name: listen-port
+ type: u16
+ doc: Set as ``0`` to choose randomly.
+ -
+ name: fwmark
+ type: u32
+ doc: Set as ``0`` to disable.
+ -
+ name: peers
+ type: indexed-array
+ sub-type: nest
+ nested-attributes: wgpeer
+ doc: |
+ The index/type parameter is unused on ``SET_DEVICE`` operations and is
+ zero on ``GET_DEVICE`` operations.
+ -
+ name: wgpeer
+ enum-name: wgpeer-attribute
+ name-prefix: wgpeer-a-
+ attr-cnt-name: --wgpeer-a-last
+ attributes:
+ -
+ name: unspec
+ type: unused
+ value: 0
+ -
+ name: public-key
+ type: binary
+ display-hint: hex
+ checks:
+ exact-len: wg-key-len
+ -
+ name: preshared-key
+ type: binary
+ doc: Set as all zeros to remove.
+ display-hint: hex
+ checks:
+ exact-len: wg-key-len
+ -
+ name: flags
+ type: u32
+ doc: |
+ ``0`` and/or ``WGPEER_F_REMOVE_ME`` if the specified peer should not
+ exist at the end of the operation, rather than added/updated and/or
+ ``WGPEER_F_REPLACE_ALLOWEDIPS`` if all current allowed IPs of this
+ peer should be removed prior to adding the list below and/or
+ ``WGPEER_F_UPDATE_ONLY`` if the peer should only be set if it already
+ exists.
+ enum: wgpeer-flags
+ -
+ name: endpoint
+ type: binary
+ doc: struct sockaddr_in or struct sockaddr_in6
+ checks:
+ min-len: 16
+ -
+ name: persistent-keepalive-interval
+ type: u16
+ doc: Set as ``0`` to disable.
+ -
+ name: last-handshake-time
+ type: binary
+ struct: --kernel-timespec
+ checks:
+ exact-len: 16
+ -
+ name: rx-bytes
+ type: u64
+ -
+ name: tx-bytes
+ type: u64
+ -
+ name: allowedips
+ type: indexed-array
+ sub-type: nest
+ nested-attributes: wgallowedip
+ doc: |
+ The index/type parameter is unused on ``SET_DEVICE`` operations and is
+ zero on ``GET_DEVICE`` operations.
+ -
+ name: protocol-version
+ type: u32
+ doc: |
+ Should not be set or used at all by most users of this API, as the
+ most recent protocol will be used when this is unset. Otherwise,
+ must be set to ``1``.
+ -
+ name: wgallowedip
+ enum-name: wgallowedip-attribute
+ name-prefix: wgallowedip-a-
+ attr-cnt-name: --wgallowedip-a-last
+ attributes:
+ -
+ name: unspec
+ type: unused
+ value: 0
+ -
+ name: family
+ type: u16
+ doc: IP family, either ``AF_INET`` or ``AF_INET6``.
+ -
+ name: ipaddr
+ type: binary
+ doc: Either ``struct in_addr`` or ``struct in6_addr``.
+ display-hint: ipv4-or-v6
+ checks:
+ min-len: 4
+ -
+ name: cidr-mask
+ type: u8
+ -
+ name: flags
+ type: u32
+ doc: |
+ ``WGALLOWEDIP_F_REMOVE_ME`` if the specified IP should be removed;
+ otherwise, this IP will be added if it is not already present.
+ enum: wgallowedip-flags
+
+operations:
+ enum-name: wg-cmd
+ name-prefix: wg-cmd-
+ list:
+ -
+ name: get-device
+ value: 0
+ doc: |
+ Retrieve WireGuard device
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ The command should be called with one but not both of:
+
+ - ``WGDEVICE_A_IFINDEX``
+ - ``WGDEVICE_A_IFNAME``
+
+ The kernel will then return several messages (``NLM_F_MULTI``). It is
+ possible that all of the allowed IPs of a single peer will not fit
+ within a single netlink message. In that case, the same peer will be
+ written in the following message, except it will only contain
+ ``WGPEER_A_PUBLIC_KEY`` and ``WGPEER_A_ALLOWEDIPS``. This may occur
+ several times in a row for the same peer. It is then up to the receiver
+ to coalesce adjacent peers. Likewise, it is possible that all peers will
+ not fit within a single message. So, subsequent peers will be sent in
+ following messages, except those will only contain ``WGDEVICE_A_IFNAME``
+ and ``WGDEVICE_A_PEERS``. It is then up to the receiver to coalesce
+ these messages to form the complete list of peers.
+
+ Since this is an ``NLA_F_DUMP`` command, the final message will always
+ be ``NLMSG_DONE``, even if an error occurs. However, this ``NLMSG_DONE``
+ message contains an integer error code. It is either zero or a negative
+ error code corresponding to the errno.
+ attribute-set: wgdevice
+ flags: [uns-admin-perm]
+
+ dump:
+ pre: wg-get-device-start
+ post: wg-get-device-done
+ request:
+ attributes:
+ - ifindex
+ - ifname
+ reply: &all-attrs
+ attributes:
+ - ifindex
+ - ifname
+ - private-key
+ - public-key
+ - flags
+ - listen-port
+ - fwmark
+ - peers
+ -
+ name: set-device
+ value: 1
+ doc: |
+ Set WireGuard device
+ ~~~~~~~~~~~~~~~~~~~~
+
+ This command should be called with a wgdevice set, containing one but
+ not both of ``WGDEVICE_A_IFINDEX`` and ``WGDEVICE_A_IFNAME``.
+
+ It is possible that the amount of configuration data exceeds that of the
+ maximum message length accepted by the kernel. In that case, several
+ messages should be sent one after another, with each successive one
+ filling in information not contained in the prior. Note that if
+ ``WGDEVICE_F_REPLACE_PEERS`` is specified in the first message, it
+ probably should not be specified in fragments that come after, so that
+ the list of peers is only cleared the first time but appended after.
+ Likewise for peers, if ``WGPEER_F_REPLACE_ALLOWEDIPS`` is specified in
+ the first message of a peer, it likely should not be specified in
+ subsequent fragments.
+
+ If an error occurs, ``NLMSG_ERROR`` will reply containing an errno.
+ attribute-set: wgdevice
+ flags: [uns-admin-perm]
+
+ do:
+ request: *all-attrs
diff --git a/Documentation/networking/6pack.rst b/Documentation/networking/6pack.rst
index bc5bf1f1a98f..66d5fd4fc821 100644
--- a/Documentation/networking/6pack.rst
+++ b/Documentation/networking/6pack.rst
@@ -94,7 +94,7 @@ kernels may lead to a compilation error because the interface to a kernel
function has been changed in the 2.1.8x kernels.
How to turn on 6pack support:
-=============================
+-----------------------------
- In the linux kernel configuration program, select the code maturity level
options menu and turn on the prompting for development drivers.
diff --git a/Documentation/networking/arcnet-hardware.rst b/Documentation/networking/arcnet-hardware.rst
index 3bf7f99cd7bb..20e5075d0d0e 100644
--- a/Documentation/networking/arcnet-hardware.rst
+++ b/Documentation/networking/arcnet-hardware.rst
@@ -4,18 +4,20 @@
ARCnet Hardware
===============
+:Author: Avery Pennarun <apenwarr@worldvisions.ca>
+
.. note::
- 1) This file is a supplement to arcnet.txt. Please read that for general
+ 1) This file is a supplement to arcnet.rst. Please read that for general
driver configuration help.
2) This file is no longer Linux-specific. It should probably be moved out
of the kernel sources. Ideas?
Because so many people (myself included) seem to have obtained ARCnet cards
without manuals, this file contains a quick introduction to ARCnet hardware,
-some cabling tips, and a listing of all jumper settings I can find. Please
-e-mail apenwarr@worldvisions.ca with any settings for your particular card,
-or any other information you have!
+some cabling tips, and a listing of all jumper settings I can find. If you
+have any settings for your particular card, and/or any other information you
+have, do not hesitate to :ref:`email to netdev <arcnet-netdev>`.
Introduction to ARCnet
@@ -72,11 +74,10 @@ level of encapsulation is defined by RFC1201, which I call "packet
splitting," that allows "virtual packets" to grow as large as 64K each,
although they are generally kept down to the Ethernet-style 1500 bytes.
-For more information on the advantages and disadvantages (mostly the
-advantages) of ARCnet networks, you might try the "ARCnet Trade Association"
-WWW page:
+For more information on ARCnet networks, visit the "ARCNET Resource Center"
+WWW page at:
- http://www.arcnet.com
+ https://www.arcnet.cc
Cabling ARCnet Networks
@@ -3226,9 +3227,6 @@ Settings for IRQ Selection (Lower Jumper Line)
Other Cards
===========
-I have no information on other models of ARCnet cards at the moment. Please
-send any and all info to:
-
- apenwarr@worldvisions.ca
+I have no information on other models of ARCnet cards at the moment.
Thanks.
diff --git a/Documentation/networking/arcnet.rst b/Documentation/networking/arcnet.rst
index 82fce606c0f0..cd43a18ad149 100644
--- a/Documentation/networking/arcnet.rst
+++ b/Documentation/networking/arcnet.rst
@@ -4,6 +4,8 @@
ARCnet
======
+:Author: Avery Pennarun <apenwarr@worldvisions.ca>
+
.. note::
See also arcnet-hardware.txt in this directory for jumper-setting
@@ -30,18 +32,7 @@ Come on, be a sport! Send me a success report!
(hey, that was even better than my original poem... this is getting bad!)
-
-.. warning::
-
- If you don't e-mail me about your success/failure soon, I may be forced to
- start SINGING. And we don't want that, do we?
-
- (You know, it might be argued that I'm pushing this point a little too much.
- If you think so, why not flame me in a quick little e-mail? Please also
- include the type of card(s) you're using, software, size of network, and
- whether it's working or not.)
-
- My e-mail address is: apenwarr@worldvisions.ca
+----
These are the ARCnet drivers for Linux.
@@ -59,23 +50,14 @@ ARCnet 2.10 ALPHA, Tomasz's all-new-and-improved RFC1051 support has been
included and seems to be working fine!
+.. _arcnet-netdev:
+
Where do I discuss these drivers?
---------------------------------
-Tomasz has been so kind as to set up a new and improved mailing list.
-Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR
-REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the
-list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
-
-There are archives of the mailing list at:
-
- http://epistolary.org/mailman/listinfo.cgi/arcnet
-
-The people on linux-net@vger.kernel.org (now defunct, replaced by
-netdev@vger.kernel.org) have also been known to be very helpful, especially
-when we're talking about ALPHA Linux kernels that may or may not work right
-in the first place.
-
+ARCnet discussions take place on netdev. Simply send your email to
+netdev@vger.kernel.org and make sure to Cc: maintainer listed in
+"ARCNET NETWORK LAYER" heading of Documentation/process/maintainers.rst.
Other Drivers and Info
----------------------
@@ -523,17 +505,9 @@ can set up your network then:
It works: what now?
-------------------
-Send mail describing your setup, preferably including driver version, kernel
-version, ARCnet card model, CPU type, number of systems on your network, and
-list of software in use to me at the following address:
-
- apenwarr@worldvisions.ca
-
-I do send (sometimes automated) replies to all messages I receive. My email
-can be weird (and also usually gets forwarded all over the place along the
-way to me), so if you don't get a reply within a reasonable time, please
-resend.
-
+Send mail following :ref:`arcnet-netdev`. Describe your setup, preferably
+including driver version, kernel version, ARCnet card model, CPU type, number
+of systems on your network, and list of software in use.
It doesn't work: what now?
--------------------------
diff --git a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
index 6877a3260582..5aedbabb7382 100644
--- a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
+++ b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
@@ -28,6 +28,7 @@ these MAP frames and send them to appropriate PDN's.
================
a. MAP packet v1 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -54,6 +55,7 @@ Payload length includes the padding length but does not include MAP header
length.
b. Map packet v4 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -107,6 +109,7 @@ over which checksum is computed.
Checksum value, indicates the checksum computed.
c. MAP packet v5 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -134,6 +137,7 @@ Payload length includes the padding length but does not include MAP header
length.
d. Checksum offload header v5
+-----------------------------
Checksum offload header fields are in big endian format.
@@ -158,7 +162,10 @@ indicates that the calculated packet checksum is invalid.
Reserved bits must be zero when sent and ignored when received.
-e. MAP packet v1/v5 (command specific)::
+e. MAP packet v1/v5 (command specific)
+--------------------------------------
+
+Packet format::
Bit 0 1 2-7 8 - 15 16 - 31
Function Command Reserved Pad Multiplexer ID Payload length
@@ -181,6 +188,7 @@ Command types
= ==========================================
f. Aggregation
+--------------
Aggregation is multiple MAP packets (can be data or command) delivered to
rmnet in a single linear skb. rmnet will process the individual
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 7cfcd183054f..bcc02355f828 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -47,6 +47,7 @@ Contents:
mellanox/mlx5/index
meta/fbnic
microsoft/netvsc
+ mucse/rnpgbe
neterion/s2io
netronome/nfp
pensando/ionic
diff --git a/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst b/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst
new file mode 100644
index 000000000000..d35cf8a46b6c
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================================
+Linux Base Driver for MUCSE(R) Gigabit PCI Express Adapters
+===========================================================
+
+Contents
+========
+
+- Identifying Your Adapter
+
+Identifying Your Adapter
+========================
+The driver is compatible with devices based on the following:
+
+ * MUCSE(R) Ethernet Controller N210 series
+ * MUCSE(R) Ethernet Controller N500 series
diff --git a/Documentation/networking/devlink/devlink-eswitch-attr.rst b/Documentation/networking/devlink/devlink-eswitch-attr.rst
index 08bb39ab1528..eafe09abc40c 100644
--- a/Documentation/networking/devlink/devlink-eswitch-attr.rst
+++ b/Documentation/networking/devlink/devlink-eswitch-attr.rst
@@ -39,6 +39,10 @@ The following is a list of E-Switch attributes.
rules.
* ``switchdev`` allows for more advanced offloading capabilities of
the E-Switch to hardware.
+ * ``switchdev_inactive`` switchdev mode but starts inactive, doesn't allow traffic
+ until explicitly activated. This mode is useful for orchestrators that
+ want to prepare the device in switchdev mode but only activate it when
+ all configurations are done.
* - ``inline-mode``
- enum
- Some HWs need the VF driver to put part of the packet
@@ -74,3 +78,12 @@ Example Usage
# enable encap-mode with legacy mode
$ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic
+
+ # start switchdev mode in inactive state
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev_inactive
+
+ # setup switchdev configurations, representors, FDB entries, etc..
+ ...
+
+ # activate switchdev mode to allow traffic
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 0a9c20d70122..ea17756dcda6 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -41,6 +41,16 @@ In order for ``driverinit`` parameters to take effect, the driver must
support reloading via the ``devlink-reload`` command. This command will
request a reload of the device driver.
+Default parameter values
+=========================
+
+Drivers may optionally export default values for parameters of cmode
+``runtime`` and ``permanent``. For ``driverinit`` parameters, the last
+value set by the driver will be used as the default value. Drivers can
+also support resetting params with cmode ``runtime`` and ``permanent``
+to their default values. Resetting ``driverinit`` params is supported
+by devlink core without additional driver support needed.
+
.. _devlink_params_generic:
Generic configuration parameters
@@ -151,3 +161,7 @@ own name.
* - ``num_doorbells``
- u32
- Controls the number of doorbells used by the device.
+ * - ``max_mac_per_vf``
+ - u32
+ - Controls the maximum number of MAC address filters that can be assigned
+ to a Virtual Function (VF).
diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
index d3cb5bb5197e..51c887f0dc83 100644
--- a/Documentation/networking/devlink/i40e.rst
+++ b/Documentation/networking/devlink/i40e.rst
@@ -7,6 +7,40 @@ i40e devlink support
This document describes the devlink features implemented by the ``i40e``
device driver.
+Parameters
+==========
+
+.. list-table:: Generic parameters implemented
+ :widths: 5 5 90
+
+ * - Name
+ - Mode
+ - Notes
+ * - ``max_mac_per_vf``
+ - runtime
+ - Controls the maximum number of MAC addresses a VF can use
+ on i40e devices.
+
+ By default (``0``), the driver enforces its internally calculated per-VF
+ MAC filter limit, which is based on the number of allocated VFS.
+
+ If set to a non-zero value, this parameter acts as a strict cap:
+ the driver will use the user-provided value instead of its internal
+ calculation.
+
+ **Important notes:**
+
+ - This value **must be set before enabling SR-IOV**.
+ Attempting to change it while SR-IOV is enabled will return an error.
+ - MAC filters are a **shared hardware resource** across all VFs.
+ Setting a high value may cause other VFs to be starved of filters.
+ - This value is a **Administrative policy**. The hardware may return
+ errors when its absolute limit is reached, regardless of the value
+ set here.
+
+ The default value is ``0`` (internal calculation is used).
+
+
Info versions
=============
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 0c58e5c729d9..35b12a2bfeba 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -99,5 +99,6 @@ parameters, info versions, and other features it supports.
prestera
qed
sfc
+ stmmac
ti-cpsw-switch
zl3073x
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 0e5f9c76e514..4bba4d780a4a 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -218,6 +218,20 @@ parameters.
* ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance
* ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads
+ * - ``swp_l4_csum_mode``
+ - string
+ - permanent
+ - Configure how the L4 checksum is calculated by the device when using
+ Software Parser (SWP) hints for header locations.
+
+ * ``default`` : Use the device's default checksum calculation
+ mode. The driver will discover during init whether or
+ full_csum or l4_only is in use. Setting this value explicitly
+ from userspace is not allowed, but some firmware versions may
+ return this value on param read.
+ * ``full_csum`` : Calculate full checksum including the pseudo-header
+ * ``l4_only`` : Calculate L4-only checksum, excluding the pseudo-header
+
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Info versions
diff --git a/Documentation/networking/devlink/stmmac.rst b/Documentation/networking/devlink/stmmac.rst
new file mode 100644
index 000000000000..47e3ff10bc08
--- /dev/null
+++ b/Documentation/networking/devlink/stmmac.rst
@@ -0,0 +1,40 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+stmmac (synopsys dwmac) devlink support
+=======================================
+
+This document describes the devlink features implemented by the ``stmmac``
+device driver.
+
+Parameters
+==========
+
+The ``stmmac`` driver implements the following driver-specific parameters.
+
+.. list-table:: Driver-specific parameters implemented
+ :widths: 5 5 5 85
+
+ * - Name
+ - Type
+ - Mode
+ - Description
+ * - ``phc_coarse_adj``
+ - Boolean
+ - runtime
+ - Enable the Coarse timestamping mode, as defined in the DWMAC TRM.
+ A detailed explanation of this timestamping mode can be found in the
+ Socfpga Functionnal Description [1].
+
+ In Coarse mode, the ptp clock is expected to be fed by a high-precision
+ clock that is externally adjusted, and the subsecond increment used for
+ timestamping is set to 1/ptp_clock_rate.
+
+ In Fine mode (i.e. Coarse mode == false), the ptp clock frequency is
+ continuously adjusted, but the subsecond increment is set to
+ 2/ptp_clock_rate.
+
+ Coarse mode is suitable for PTP Grand Master operation. If unsure, leave
+ the parameter to False.
+
+ [1] https://www.intel.com/content/www/us/en/docs/programmable/683126/21-2/functional-description-of-the-emac.html
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index 7b2e69cd7ef0..5c79740a533b 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -1104,12 +1104,11 @@ health of the network and for discovery of other nodes.
In Linux, both HSR and PRP are implemented in the hsr driver, which
instantiates a virtual, stackable network interface with two member ports.
The driver only implements the basic roles of DANH (Doubly Attached Node
-implementing HSR) and DANP (Doubly Attached Node implementing PRP); the roles
-of RedBox and QuadBox are not implemented (therefore, bridging a hsr network
-interface with a physical switch port does not produce the expected result).
+implementing HSR), DANP (Doubly Attached Node implementing PRP) and RedBox
+(allows non-HSR devices to connect to the ring via Interlink ports).
-A driver which is able of offloading certain functions of a DANP or DANH should
-declare the corresponding netdev features as indicated by the documentation at
+A driver which is able of offloading certain functions should declare the
+corresponding netdev features as indicated by the documentation at
``Documentation/networking/netdev-features.rst``. Additionally, the following
methods must be implemented:
@@ -1120,6 +1119,14 @@ methods must be implemented:
- ``port_hsr_leave``: function invoked when a given switch port leaves a
DANP/DANH and returns to normal operation as a standalone port.
+Note that the ``NETIF_F_HW_HSR_DUP`` feature relies on transmission towards
+multiple ports, which is generally available whenever the tagging protocol uses
+the ``dsa_xmit_port_mask()`` helper function. If the helper is used, the HSR
+offload feature should also be set. The ``dsa_port_simple_hsr_join()`` and
+``dsa_port_simple_hsr_leave()`` methods can be used as generic implementations
+of ``port_hsr_join`` and ``port_hsr_leave``, if this is the only supported
+offload feature.
+
TODO
====
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index b270886c5f5d..af56c304cef4 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -242,6 +242,7 @@ Userspace to kernel:
``ETHTOOL_MSG_RSS_SET`` set RSS settings
``ETHTOOL_MSG_RSS_CREATE_ACT`` create an additional RSS context
``ETHTOOL_MSG_RSS_DELETE_ACT`` delete an additional RSS context
+ ``ETHTOOL_MSG_MSE_GET`` get MSE diagnostic data
===================================== =================================
Kernel to userspace:
@@ -299,6 +300,7 @@ Kernel to userspace:
``ETHTOOL_MSG_RSS_CREATE_ACT_REPLY`` create an additional RSS context
``ETHTOOL_MSG_RSS_CREATE_NTF`` additional RSS context created
``ETHTOOL_MSG_RSS_DELETE_NTF`` additional RSS context deleted
+ ``ETHTOOL_MSG_MSE_GET_REPLY`` MSE diagnostic data
======================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -2458,6 +2460,68 @@ Kernel response contents:
For a description of each attribute, see ``TSCONFIG_GET``.
+MSE_GET
+=======
+
+Retrieves detailed Mean Square Error (MSE) diagnostic information from the PHY.
+
+Request Contents:
+
+ ==================================== ====== ============================
+ ``ETHTOOL_A_MSE_HEADER`` nested request header
+ ==================================== ====== ============================
+
+Kernel Response Contents:
+
+ ==================================== ====== ================================
+ ``ETHTOOL_A_MSE_HEADER`` nested reply header
+ ``ETHTOOL_A_MSE_CAPABILITIES`` nested capability/scale info for MSE
+ measurements
+ ``ETHTOOL_A_MSE_CHANNEL_A`` nested snapshot for Channel A
+ ``ETHTOOL_A_MSE_CHANNEL_B`` nested snapshot for Channel B
+ ``ETHTOOL_A_MSE_CHANNEL_C`` nested snapshot for Channel C
+ ``ETHTOOL_A_MSE_CHANNEL_D`` nested snapshot for Channel D
+ ``ETHTOOL_A_MSE_WORST_CHANNEL`` nested snapshot for worst channel
+ ``ETHTOOL_A_MSE_LINK`` nested snapshot for link-wide aggregate
+ ==================================== ====== ================================
+
+MSE Capabilities
+----------------
+
+This nested attribute reports the capability / scaling properties used to
+interpret snapshot values.
+
+ ============================================== ====== =========================
+ ``ETHTOOL_A_MSE_CAPABILITIES_MAX_AVERAGE_MSE`` uint max avg_mse scale
+ ``ETHTOOL_A_MSE_CAPABILITIES_MAX_PEAK_MSE`` uint max peak_mse scale
+ ``ETHTOOL_A_MSE_CAPABILITIES_REFRESH_RATE_PS`` uint sample rate (picoseconds)
+ ``ETHTOOL_A_MSE_CAPABILITIES_NUM_SYMBOLS`` uint symbols per HW sample
+ ============================================== ====== =========================
+
+The max-average/peak fields are included only if the corresponding metric
+is supported by the PHY. Their absence indicates that the metric is not
+available.
+
+See ``struct phy_mse_capability`` kernel documentation in
+``include/linux/phy.h``.
+
+MSE Snapshot
+------------
+
+Each per-channel nest contains an atomic snapshot of MSE values for that
+selector (channel A/B/C/D, worst channel, or link).
+
+ ========================================== ====== ===================
+ ``ETHTOOL_A_MSE_SNAPSHOT_AVERAGE_MSE`` uint average MSE value
+ ``ETHTOOL_A_MSE_SNAPSHOT_PEAK_MSE`` uint current peak MSE
+ ``ETHTOOL_A_MSE_SNAPSHOT_WORST_PEAK_MSE`` uint worst-case peak MSE
+ ========================================== ====== ===================
+
+Within each channel nest, only the metrics supported by the PHY will be present.
+
+See ``struct phy_mse_snapshot`` kernel documentation in
+``include/linux/phy.h``.
+
Request translation
===================
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index c775cababc8c..75db2251649b 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -131,10 +131,7 @@ Contents:
vxlan
x25
x25-iface
- xfrm_device
- xfrm_proc
- xfrm_sync
- xfrm_sysctl
+ xfrm/index
xdp-rx-metadata
xsk-tx-metadata
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index a06cb99d66dc..bc9a01606daf 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -673,6 +673,16 @@ tcp_moderate_rcvbuf - BOOLEAN
Default: 1 (enabled)
+tcp_rcvbuf_low_rtt - INTEGER
+ rcvbuf autotuning can over estimate final socket rcvbuf, which
+ can lead to cache trashing for high throughput flows.
+
+ For small RTT flows (below tcp_rcvbuf_low_rtt usecs), we can relax
+ rcvbuf growth: Few additional ms to reach the final (and smaller)
+ rcvbuf is a good tradeoff.
+
+ Default : 1000 (1 ms)
+
tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values:
@@ -854,9 +864,18 @@ tcp_sack - BOOLEAN
Default: 1 (enabled)
+tcp_comp_sack_rtt_percent - INTEGER
+ Percentage of SRTT used for the compressed SACK feature.
+ See tcp_comp_sack_nr, tcp_comp_sack_delay_ns, tcp_comp_sack_slack_ns.
+
+ Possible values : 1 - 1000
+
+ Default : 33 %
+
tcp_comp_sack_delay_ns - LONG INTEGER
- TCP tries to reduce number of SACK sent, using a timer
- based on 5% of SRTT, capped by this sysctl, in nano seconds.
+ TCP tries to reduce number of SACK sent, using a timer based
+ on tcp_comp_sack_rtt_percent of SRTT, capped by this sysctl
+ in nano seconds.
The default is 1ms, based on TSO autosizing period.
Default : 1,000,000 ns (1 ms)
@@ -866,8 +885,9 @@ tcp_comp_sack_slack_ns - LONG INTEGER
timer used by SACK compression. This gives extra time
for small RTT flows, and reduces system overhead by allowing
opportunistic reduction of timer interrupts.
+ Too big values might reduce goodput.
- Default : 100,000 ns (100 us)
+ Default : 10,000 ns (10 us)
tcp_comp_sack_nr - INTEGER
Max number of SACK that can be compressed.
@@ -1796,6 +1816,23 @@ icmp_errors_use_inbound_ifaddr - BOOLEAN
Default: 0 (disabled)
+icmp_errors_extension_mask - UNSIGNED INTEGER
+ Bitmask of ICMP extensions to append to ICMPv4 error messages
+ ("Destination Unreachable", "Time Exceeded" and "Parameter Problem").
+ The original datagram is trimmed / padded to 128 bytes in order to be
+ compatible with applications that do not comply with RFC 4884.
+
+ Possible extensions are:
+
+ ==== ==============================================================
+ 0x01 Incoming IP interface information according to RFC 5837.
+ Extension will include the index, IPv4 address (if present),
+ name and MTU of the IP interface that received the datagram
+ which elicited the ICMP error.
+ ==== ==============================================================
+
+ Default: 0x00 (no extensions)
+
igmp_max_memberships - INTEGER
Change the maximum number of multicast groups we can subscribe to.
Default: 20
@@ -3262,6 +3299,23 @@ error_anycast_as_unicast - BOOLEAN
Default: 0 (disabled)
+errors_extension_mask - UNSIGNED INTEGER
+ Bitmask of ICMP extensions to append to ICMPv6 error messages
+ ("Destination Unreachable" and "Time Exceeded"). The original datagram
+ is trimmed / padded to 128 bytes in order to be compatible with
+ applications that do not comply with RFC 4884.
+
+ Possible extensions are:
+
+ ==== ==============================================================
+ 0x01 Incoming IP interface information according to RFC 5837.
+ Extension will include the index, IPv6 address (if present),
+ name and MTU of the IP interface that received the datagram
+ which elicited the ICMP error.
+ ==== ==============================================================
+
+ Default: 0x00 (no extensions)
+
xfrm6_gc_thresh - INTEGER
(Obsolete since linux-4.14)
The threshold at which we will start garbage collecting for IPv6
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index 7dd60366f4ff..4e008efebb35 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -263,7 +263,9 @@ are not well known).
Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
selected sockets or using the global ``net.core.busy_poll`` and
``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
-also exists.
+also exists. Threaded polling of NAPI also has a mode to busy poll for
+packets (:ref:`threaded busy polling<threaded_busy_poll>`) using the NAPI
+processing kthread.
epoll-based busy polling
------------------------
@@ -426,6 +428,52 @@ Therefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is
the recommended usage, because otherwise setting ``irq-suspend-timeout``
might not have any discernible effect.
+.. _threaded_busy_poll:
+
+Threaded NAPI busy polling
+--------------------------
+
+Threaded NAPI busy polling extends threaded NAPI and adds support to do
+continuous busy polling of the NAPI. This can be useful for forwarding or
+AF_XDP applications.
+
+Threaded NAPI busy polling can be enabled on per NIC queue basis using Netlink.
+
+For example, using the following script:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-set \
+ --json='{"id": 66, "threaded": "busy-poll"}'
+
+The kernel will create a kthread that busy polls on this NAPI.
+
+The user may elect to set the CPU affinity of this kthread to an unused CPU
+core to improve how often the NAPI is polled at the expense of wasted CPU
+cycles. Note that this will keep the CPU core busy with 100% usage.
+
+Once threaded busy polling is enabled for a NAPI, PID of the kthread can be
+retrieved using Netlink so the affinity of the kthread can be set up.
+
+For example, the following script can be used to fetch the PID:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-get --json='{"id": 66}'
+
+This will output something like following, the pid `258` is the PID of the
+kthread that is polling this NAPI.
+
+.. code-block:: bash
+
+ $ {'defer-hard-irqs': 0,
+ 'gro-flush-timeout': 0,
+ 'id': 66,
+ 'ifindex': 2,
+ 'irq-suspend-timeout': 0,
+ 'pid': 258,
+ 'threaded': 'busy-poll'}
+
.. _threaded:
Threaded NAPI
diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rst b/Documentation/networking/net_cachelines/inet_connection_sock.rst
index 8fae85ebb773..cc2000f55c29 100644
--- a/Documentation/networking/net_cachelines/inet_connection_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst
@@ -12,8 +12,8 @@ struct inet_sock icsk_inet read_mostly r
struct request_sock_queue icsk_accept_queue
struct inet_bind_bucket icsk_bind_hash read_mostly tcp_set_state
struct inet_bind2_bucket icsk_bind2_hash read_mostly tcp_set_state,inet_put_port
-struct timer_list icsk_retransmit_timer read_write inet_csk_reset_xmit_timer,tcp_connect
struct timer_list icsk_delack_timer read_mostly inet_csk_reset_xmit_timer,tcp_connect
+struct timer_list icsk_keepalive_timer
u32 icsk_rto read_write tcp_cwnd_validate,tcp_schedule_loss_probe,tcp_connect_init,tcp_connect,tcp_write_xmit,tcp_push_one
u32 icsk_rto_min
u32 icsk_rto_max read_mostly tcp_reset_xmit_timer
diff --git a/Documentation/networking/net_cachelines/inet_sock.rst b/Documentation/networking/net_cachelines/inet_sock.rst
index b11bf48fa2b3..4c72a28a7012 100644
--- a/Documentation/networking/net_cachelines/inet_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_sock.rst
@@ -5,42 +5,43 @@
inet_sock struct fast path usage breakdown
==========================================
-======================= ===================== =================== =================== ======================================================================================================
-Type Name fastpath_tx_access fastpath_rx_access comment
-======================= ===================== =================== =================== ======================================================================================================
-struct sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data
-struct ipv6_pinfo* pinet6
-be16 inet_sport read_mostly __tcp_transmit_skb
-be32 inet_daddr read_mostly ip_select_ident_segs
-be32 inet_rcv_saddr
-be16 inet_dport read_mostly __tcp_transmit_skb
-u16 inet_num
-be32 inet_saddr
-s16 uc_ttl read_mostly __ip_queue_xmit/ip_select_ttl
-u16 cmsg_flags
-struct ip_options_rcu* inet_opt read_mostly __ip_queue_xmit
-u16 inet_id read_mostly ip_select_ident_segs
-u8 tos read_mostly ip_queue_xmit
-u8 min_ttl
-u8 mc_ttl
-u8 pmtudisc
-u8:1 recverr
-u8:1 is_icsk
-u8:1 freebind
-u8:1 hdrincl
-u8:1 mc_loop
-u8:1 transparent
-u8:1 mc_all
-u8:1 nodefrag
-u8:1 bind_address_no_port
-u8:1 recverr_rfc4884
-u8:1 defer_connect read_mostly tcp_sendmsg_fastopen
-u8 rcv_tos
-u8 convert_csum
-int uc_index
-int mc_index
-be32 mc_addr
-struct ip_mc_socklist* mc_list
-struct inet_cork_full cork read_mostly __tcp_transmit_skb
-struct local_port_range
-======================= ===================== =================== =================== ======================================================================================================
+======================== ===================== =================== =================== ======================================================================================================
+Type Name fastpath_tx_access fastpath_rx_access comment
+======================== ===================== =================== =================== ======================================================================================================
+struct sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data
+struct ipv6_pinfo* pinet6
+struct ipv6_fl_socklist* ipv6_fl_list read_mostly tcp_v6_connect,__ip6_datagram_connect,udpv6_sendmsg,rawv6_sendmsg
+be16 inet_sport read_mostly __tcp_transmit_skb
+be32 inet_daddr read_mostly ip_select_ident_segs
+be32 inet_rcv_saddr
+be16 inet_dport read_mostly __tcp_transmit_skb
+u16 inet_num
+be32 inet_saddr
+s16 uc_ttl read_mostly __ip_queue_xmit/ip_select_ttl
+u16 cmsg_flags
+struct ip_options_rcu* inet_opt read_mostly __ip_queue_xmit
+u16 inet_id read_mostly ip_select_ident_segs
+u8 tos read_mostly ip_queue_xmit
+u8 min_ttl
+u8 mc_ttl
+u8 pmtudisc
+u8:1 recverr
+u8:1 is_icsk
+u8:1 freebind
+u8:1 hdrincl
+u8:1 mc_loop
+u8:1 transparent
+u8:1 mc_all
+u8:1 nodefrag
+u8:1 bind_address_no_port
+u8:1 recverr_rfc4884
+u8:1 defer_connect read_mostly tcp_sendmsg_fastopen
+u8 rcv_tos
+u8 convert_csum
+int uc_index
+int mc_index
+be32 mc_addr
+struct ip_mc_socklist* mc_list
+struct inet_cork_full cork read_mostly __tcp_transmit_skb
+struct local_port_range
+======================== ===================== =================== =================== ======================================================================================================
diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
index 6e7b20afd2d4..beaf1880a19b 100644
--- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
+++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
@@ -102,7 +102,8 @@ u8 sysctl_tcp_app_win
u8 sysctl_tcp_frto tcp_enter_loss
u8 sysctl_tcp_nometrics_save TCP_LAST_ACK/tcp_update_metrics
u8 sysctl_tcp_no_ssthresh_metrics_save TCP_LAST_ACK/tcp_(update/init)_metrics
-u8 sysctl_tcp_moderate_rcvbuf read_mostly read_mostly tcp_tso_should_defer(tx);tcp_rcv_space_adjust(rx)
+u8 sysctl_tcp_moderate_rcvbuf read_mostly tcp_rcvbuf_grow()
+u32 sysctl_tcp_rcvbuf_low_rtt read_mostly tcp_rcvbuf_grow()
u8 sysctl_tcp_tso_win_divisor read_mostly tcp_tso_should_defer(tcp_write_xmit)
u8 sysctl_tcp_workaround_signed_windows tcp_select_window
int sysctl_tcp_limit_output_bytes read_mostly tcp_small_queue_check(tcp_write_xmit)
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index 2555e75e5cc1..4ab5d7b05cf1 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -88,7 +88,7 @@ for example:
nc -u -l -p <port>' / 'nc -u -l <port>
- or::
+ or::
netcat -u -l -p <port>' / 'netcat -u -l <port>
diff --git a/Documentation/networking/nfc.rst b/Documentation/networking/nfc.rst
index 9aab3a88c9b2..401735006143 100644
--- a/Documentation/networking/nfc.rst
+++ b/Documentation/networking/nfc.rst
@@ -71,7 +71,8 @@ Userspace interface
The userspace interface is divided in control operations and low-level data
exchange operation.
-CONTROL OPERATIONS:
+Control operations
+------------------
Generic netlink is used to implement the interface to the control operations.
The operations are composed by commands and events, all listed below:
@@ -100,7 +101,8 @@ relevant information such as the supported NFC protocols.
All polling operations requested through one netlink socket are stopped when
it's closed.
-LOW-LEVEL DATA EXCHANGE:
+Low-level data exchange
+-----------------------
The userspace must use PF_NFC sockets to perform any data communication with
targets. All NFC sockets use AF_NFC::
diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index a874d007f2db..904a910f198e 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -71,3 +71,43 @@ smcr_max_conns_per_lgr - INTEGER
acceptable value ranges from 16 to 255. Only for SMC-R v2.1 and later.
Default: 255
+
+smcr_max_send_wr - INTEGER
+ So-called work request buffers are SMCR link (and RDMA queue pair) level
+ resources necessary for performing RDMA operations. Since up to 255
+ connections can share a link group and thus also a link and the number
+ of the work request buffers is decided when the link is allocated,
+ depending on the workload it can be a bottleneck in a sense that threads
+ have to wait for work request buffers to become available. Before the
+ introduction of this control the maximal number of work request buffers
+ available on the send path used to be hard coded to 16. With this control
+ it becomes configurable. The acceptable range is between 2 and 2048.
+
+ Please be aware that all the buffers need to be allocated as a physically
+ continuous array in which each element is a single buffer and has the size
+ of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying
+ with half of the buffer count until it is ether successful or (unlikely)
+ we dip below the old hard coded value which is 16 where we give up much
+ like before having this control.
+
+ Default: 16
+
+smcr_max_recv_wr - INTEGER
+ So-called work request buffers are SMCR link (and RDMA queue pair) level
+ resources necessary for performing RDMA operations. Since up to 255
+ connections can share a link group and thus also a link and the number
+ of the work request buffers is decided when the link is allocated,
+ depending on the workload it can be a bottleneck in a sense that threads
+ have to wait for work request buffers to become available. Before the
+ introduction of this control the maximal number of work request buffers
+ available on the receive path used to be hard coded to 16. With this control
+ it becomes configurable. The acceptable range is between 2 and 2048.
+
+ Please be aware that all the buffers need to be allocated as a physically
+ continuous array in which each element is a single buffer and has the size
+ of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying
+ with half of the buffer count until it is ether successful or (unlikely)
+ we dip below the old hard coded value which is 16 where we give up much
+ like before having this control.
+
+ Default: 48
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index 518284e287b0..66b0ef941457 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -184,9 +184,11 @@ Protocol-related statistics can be requested in get commands by setting
the `ETHTOOL_FLAG_STATS` flag in `ETHTOOL_A_HEADER_FLAGS`. Currently
statistics are supported in the following commands:
- - `ETHTOOL_MSG_PAUSE_GET`
- `ETHTOOL_MSG_FEC_GET`
+ - `ETHTOOL_MSG_LINKSTATE_GET`
- `ETHTOOL_MSG_MM_GET`
+ - `ETHTOOL_MSG_PAUSE_GET`
+ - `ETHTOOL_MSG_TSINFO_GET`
debugfs
-------
diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst
index 36cc7afc2527..980c442d7161 100644
--- a/Documentation/networking/tls.rst
+++ b/Documentation/networking/tls.rst
@@ -280,6 +280,26 @@ If the record decrypted turns out to had been padded or is not a data
record it will be decrypted again into a kernel buffer without zero copy.
Such events are counted in the ``TlsDecryptRetry`` statistic.
+TLS_TX_MAX_PAYLOAD_LEN
+~~~~~~~~~~~~~~~~~~~~~~
+
+Specifies the maximum size of the plaintext payload for transmitted TLS records.
+
+When this option is set, the kernel enforces the specified limit on all outgoing
+TLS records. No plaintext fragment will exceed this size. This option can be used
+to implement the TLS Record Size Limit extension [1].
+
+* For TLS 1.2, the value corresponds directly to the record size limit.
+* For TLS 1.3, the value should be set to record_size_limit - 1, since
+ the record size limit includes one additional byte for the ContentType
+ field.
+
+The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to
+16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for the
+extra byte used by the ContentType field.
+
+[1] https://datatracker.ietf.org/doc/html/rfc8449
+
Statistics
==========
diff --git a/Documentation/networking/xfrm/index.rst b/Documentation/networking/xfrm/index.rst
new file mode 100644
index 000000000000..7d866da836fe
--- /dev/null
+++ b/Documentation/networking/xfrm/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+XFRM Framework
+==============
+
+.. toctree::
+ :maxdepth: 2
+
+ xfrm_device
+ xfrm_proc
+ xfrm_sync
+ xfrm_sysctl
diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm/xfrm_device.rst
index 122204da0fff..b0d85a5f57d1 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm/xfrm_device.rst
@@ -20,11 +20,15 @@ can radically increase throughput and decrease CPU utilization. The XFRM
Device interface allows NIC drivers to offer to the stack access to the
hardware offload.
-Right now, there are two types of hardware offload that kernel supports.
+Right now, there are two types of hardware offload that kernel supports:
+
* IPsec crypto offload:
+
* NIC performs encrypt/decrypt
* Kernel does everything else
+
* IPsec packet offload:
+
* NIC performs encrypt/decrypt
* NIC does encapsulation
* Kernel and NIC have SA and policy in-sync
@@ -34,7 +38,7 @@ Right now, there are two types of hardware offload that kernel supports.
Userland access to the offload is typically through a system such as
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
be handy when experimenting. An example command might look something
-like this for crypto offload:
+like this for crypto offload::
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
reqid 0x07 replay-window 32 \
@@ -42,7 +46,7 @@ like this for crypto offload:
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
offload dev eth4 dir in
-and for packet offload
+and for packet offload::
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
reqid 0x07 replay-window 32 \
@@ -153,26 +157,26 @@ the packet's skb. At this point the data should be decrypted but the
IPsec headers are still in the packet data; they are removed later up
the stack in xfrm_input().
- find and hold the SA that was used to the Rx skb::
+1. Find and hold the SA that was used to the Rx skb::
- get spi, protocol, and destination IP from packet headers
+ /* get spi, protocol, and destination IP from packet headers */
xs = find xs from (spi, protocol, dest_IP)
xfrm_state_hold(xs);
- store the state information into the skb::
+2. Store the state information into the skb::
sp = secpath_set(skb);
if (!sp) return;
sp->xvec[sp->len++] = xs;
sp->olen++;
- indicate the success and/or error status of the offload::
+3. Indicate the success and/or error status of the offload::
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = crypto_status;
- hand the packet to napi_gro_receive() as usual
+4. Hand the packet to napi_gro_receive() as usual.
In ESN mode, xdo_dev_state_advance_esn() is called from
xfrm_replay_advance_esn() for RX, and xfrm_replay_overflow_offload_esn for TX.
diff --git a/Documentation/networking/xfrm_proc.rst b/Documentation/networking/xfrm/xfrm_proc.rst
index 973d1571acac..973d1571acac 100644
--- a/Documentation/networking/xfrm_proc.rst
+++ b/Documentation/networking/xfrm/xfrm_proc.rst
diff --git a/Documentation/networking/xfrm_sync.rst b/Documentation/networking/xfrm/xfrm_sync.rst
index 6246503ceab2..dfc2ec0df380 100644
--- a/Documentation/networking/xfrm_sync.rst
+++ b/Documentation/networking/xfrm/xfrm_sync.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
-====
-XFRM
-====
+=========
+XFRM sync
+=========
The sync patches work is based on initial patches from
Krisztian <hidden@balabit.hu> and others and additional patches
@@ -36,7 +36,7 @@ is not driven by packet arrival.
- the replay sequence for both inbound and outbound
1) Message Structure
-----------------------
+--------------------
nlmsghdr:aevent_id:optional-TLVs.
@@ -83,31 +83,31 @@ when going from kernel to user space)
A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS
to get notified of these events.
-2) TLVS reflect the different parameters:
------------------------------------------
+2) TLVS reflect the different parameters
+----------------------------------------
a) byte value (XFRMA_LTIME_VAL)
-This TLV carries the running/current counter for byte lifetime since
-last event.
+ This TLV carries the running/current counter for byte lifetime since
+ last event.
-b)replay value (XFRMA_REPLAY_VAL)
+b) replay value (XFRMA_REPLAY_VAL)
-This TLV carries the running/current counter for replay sequence since
-last event.
+ This TLV carries the running/current counter for replay sequence since
+ last event.
-c)replay threshold (XFRMA_REPLAY_THRESH)
+c) replay threshold (XFRMA_REPLAY_THRESH)
-This TLV carries the threshold being used by the kernel to trigger events
-when the replay sequence is exceeded.
+ This TLV carries the threshold being used by the kernel to trigger events
+ when the replay sequence is exceeded.
d) expiry timer (XFRMA_ETIMER_THRESH)
-This is a timer value in milliseconds which is used as the nagle
-value to rate limit the events.
+ This is a timer value in milliseconds which is used as the nagle
+ value to rate limit the events.
-3) Default configurations for the parameters:
----------------------------------------------
+3) Default configurations for the parameters
+--------------------------------------------
By default these events should be turned off unless there is
at least one listener registered to listen to the multicast
@@ -121,12 +121,14 @@ in case they are not specified.
the two sysctls/proc entries are:
a) /proc/sys/net/core/sysctl_xfrm_aevent_etime
-used to provide default values for the XFRMA_ETIMER_THRESH in incremental
-units of time of 100ms. The default is 10 (1 second)
+
+ Used to provide default values for the XFRMA_ETIMER_THRESH in incremental
+ units of time of 100ms. The default is 10 (1 second)
b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth
-used to provide default values for XFRMA_REPLAY_THRESH parameter
-in incremental packet count. The default is two packets.
+
+ Used to provide default values for XFRMA_REPLAY_THRESH parameter
+ in incremental packet count. The default is two packets.
4) Message types
----------------
@@ -134,50 +136,51 @@ in incremental packet count. The default is two packets.
a) XFRM_MSG_GETAE issued by user-->kernel.
XFRM_MSG_GETAE does not carry any TLVs.
-The response is a XFRM_MSG_NEWAE which is formatted based on what
-XFRM_MSG_GETAE queried for.
+ The response is a XFRM_MSG_NEWAE which is formatted based on what
+ XFRM_MSG_GETAE queried for.
+
+ The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-* if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
-* if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
+ * if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
+ * if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
b) XFRM_MSG_NEWAE is issued by either user space to configure
or kernel to announce events or respond to a XFRM_MSG_GETAE.
-i) user --> kernel to configure a specific SA.
+ i) user --> kernel to configure a specific SA.
-any of the values or threshold parameters can be updated by passing the
-appropriate TLV.
+ any of the values or threshold parameters can be updated by passing the
+ appropriate TLV.
-A response is issued back to the sender in user space to indicate success
-or failure.
+ A response is issued back to the sender in user space to indicate success
+ or failure.
-In the case of success, additionally an event with
-XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
+ In the case of success, additionally an event with
+ XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
-ii) kernel->user direction as a response to XFRM_MSG_GETAE
+ ii) kernel->user direction as a response to XFRM_MSG_GETAE
-The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+ The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-The threshold TLVs will be included if explicitly requested in
-the XFRM_MSG_GETAE message.
+ The threshold TLVs will be included if explicitly requested in
+ the XFRM_MSG_GETAE message.
-iii) kernel->user to report as event if someone sets any values or
- thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
- In such a case XFRM_AE_CU flag is set to inform the user that
- the change happened as a result of an update.
- The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+ iii) kernel->user to report as event if someone sets any values or
+ thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
+ In such a case XFRM_AE_CU flag is set to inform the user that
+ the change happened as a result of an update.
+ The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-iv) kernel->user to report event when replay threshold or a timeout
- is exceeded.
+ iv) kernel->user to report event when replay threshold or a timeout
+ is exceeded.
In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout
happened) is set to inform the user what happened.
Note the two flags are mutually exclusive.
The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-Exceptions to threshold settings
---------------------------------
+5) Exceptions to threshold settings
+-----------------------------------
If you have an SA that is getting hit by traffic in bursts such that
there is a period where the timer threshold expires with no packets
diff --git a/Documentation/networking/xfrm_sysctl.rst b/Documentation/networking/xfrm/xfrm_sysctl.rst
index 47b9bbdd0179..7d0c4b17c0bd 100644
--- a/Documentation/networking/xfrm_sysctl.rst
+++ b/Documentation/networking/xfrm/xfrm_sysctl.rst
@@ -4,8 +4,8 @@
XFRM Syscall
============
-/proc/sys/net/core/xfrm_* Variables:
-====================================
+/proc/sys/net/core/xfrm_* Variables
+===================================
xfrm_acq_expires - INTEGER
default 30 - hard timeout in seconds for acquire requests
diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst
index a0f5244fb427..ea70633d9ce6 100644
--- a/Documentation/power/index.rst
+++ b/Documentation/power/index.rst
@@ -19,6 +19,7 @@ Power Management
power_supply_class
runtime_pm
s2ram
+ shutdown-debugging
suspend-and-cpuhotplug
suspend-and-interrupts
swsusp-and-swap-files
diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
index 5019c79c7710..4c008e2202f0 100644
--- a/Documentation/power/pm_qos_interface.rst
+++ b/Documentation/power/pm_qos_interface.rst
@@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
From user space:
-The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
+The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
+the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system wakeup
latency QoS.
Only processes can register a PM QoS request. To provide for automatic
@@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
parameter requests as follows.
To register the default PM QoS target for the CPU latency QoS, the process must
-open /dev/cpu_dma_latency.
+open /dev/cpu_dma_latency. To register a CPU system wakeup QoS limit, the
+process must open /dev/cpu_wakeup_latency.
As long as the device node is held open that process has a registered
request on the parameter.
To change the requested target value, the process needs to write an s32 value to
the open device node. Alternatively, it can write a hex string for the value
-using the 10 char long format e.g. "0x12345678". This translates to a
-cpu_latency_qos_update_request() call.
+using the 10 char long format e.g. "0x12345678".
To remove the user mode request for a target value simply close the device
node.
diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst
index c8dbdb8595e5..8246df3cecd7 100644
--- a/Documentation/power/runtime_pm.rst
+++ b/Documentation/power/runtime_pm.rst
@@ -480,16 +480,6 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
`bool pm_runtime_status_suspended(struct device *dev);`
- return true if the device's runtime PM status is 'suspended'
- `void pm_runtime_allow(struct device *dev);`
- - set the power.runtime_auto flag for the device and decrease its usage
- counter (used by the /sys/devices/.../power/control interface to
- effectively allow the device to be power managed at run time)
-
- `void pm_runtime_forbid(struct device *dev);`
- - unset the power.runtime_auto flag for the device and increase its usage
- counter (used by the /sys/devices/.../power/control interface to
- effectively prevent the device from being power managed at run time)
-
`void pm_runtime_no_callbacks(struct device *dev);`
- set the power.no_callbacks flag for the device and remove the runtime
PM attributes from /sys/devices/.../power (or prevent them from being
diff --git a/Documentation/power/shutdown-debugging.rst b/Documentation/power/shutdown-debugging.rst
new file mode 100644
index 000000000000..c510122e0bbc
--- /dev/null
+++ b/Documentation/power/shutdown-debugging.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Debugging Kernel Shutdown Hangs with pstore
++++++++++++++++++++++++++++++++++++++++++++
+
+Overview
+========
+If the system hangs while shutting down, the kernel logs may need to be
+retrieved to debug the issue.
+
+On systems that have a UART available, it is best to configure the kernel to use
+this UART for kernel console output.
+
+If a UART isn't available, the ``pstore`` subsystem provides a mechanism to
+persist this data across a system reset, allowing it to be retrieved on the next
+boot.
+
+Kernel Configuration
+====================
+To enable ``pstore`` and enable saving kernel ring buffer logs, set the
+following kernel configuration options:
+
+* ``CONFIG_PSTORE=y``
+* ``CONFIG_PSTORE_CONSOLE=y``
+
+Additionally, enable a backend to store the data. Depending upon your platform
+some potential options include:
+
+* ``CONFIG_EFI_VARS_PSTORE=y``
+* ``CONFIG_PSTORE_RAM=y``
+* ``CONFIG_CHROMEOS_PSTORE=y``
+* ``CONFIG_PSTORE_BLK=y``
+
+Kernel Command-line Parameters
+==============================
+Add these parameters to your kernel command line:
+
+* ``printk.always_kmsg_dump=Y``
+ * Forces the kernel to dump the entire message buffer to pstore during
+ shutdown
+* ``efi_pstore.pstore_disable=N``
+ * For EFI-based systems, ensures the EFI backend is active
+
+Userspace Interaction and Log Retrieval
+=======================================
+On the next boot after a hang, pstore logs will be available in the pstore
+filesystem (``/sys/fs/pstore``) and can be retrieved by userspace.
+
+On systemd systems, the ``systemd-pstore`` service will help do the following:
+
+#. Locate pstore data in ``/sys/fs/pstore``
+#. Read and save it to ``/var/lib/systemd/pstore``
+#. Clear pstore data for the next event
diff --git a/Documentation/process/2.Process.rst b/Documentation/process/2.Process.rst
index ef3b116492df..7bd41838a546 100644
--- a/Documentation/process/2.Process.rst
+++ b/Documentation/process/2.Process.rst
@@ -13,24 +13,19 @@ how the process works is required in order to be an effective part of it.
The big picture
---------------
-The kernel developers use a loosely time-based release process, with a new
-major kernel release happening every two or three months. The recent
-release history looks like this:
-
- ====== =================
- 5.0 March 3, 2019
- 5.1 May 5, 2019
- 5.2 July 7, 2019
- 5.3 September 15, 2019
- 5.4 November 24, 2019
- 5.5 January 6, 2020
- ====== =================
-
-Every 5.x release is a major kernel release with new features, internal
-API changes, and more. A typical release can contain about 13,000
-changesets with changes to several hundred thousand lines of code. 5.x is
-the leading edge of Linux kernel development; the kernel uses a
-rolling development model which is continually integrating major changes.
+The Linux kernel uses a loosely time-based, rolling release development
+model. A new major kernel release (which we will call, as an example, 9.x)
+[1]_ happens every two or three months, which comes with new features,
+internal API changes, and more. A typical release can contain about 13,000
+changesets with changes to several hundred thousand lines of code. Recent
+releases, along with their dates, can be found at `Wikipedia
+<https://en.wikipedia.org/wiki/Linux_kernel_version_history>`_.
+
+.. [1] Strictly speaking, the Linux kernel does not use semantic versioning
+ number scheme, but rather the 9.x pair identifies major release
+ version as a whole number. For each release, x is incremented,
+ but 9 is incremented only if x is deemed large enough (e.g.
+ Linux 5.0 is released following Linux 4.20).
A relatively straightforward discipline is followed with regard to the
merging of patches for each release. At the beginning of each development
@@ -48,9 +43,9 @@ detail later on).
The merge window lasts for approximately two weeks. At the end of this
time, Linus Torvalds will declare that the window is closed and release the
-first of the "rc" kernels. For the kernel which is destined to be 5.6,
+first of the "rc" kernels. For the kernel which is destined to be 9.x,
for example, the release which happens at the end of the merge window will
-be called 5.6-rc1. The -rc1 release is the signal that the time to
+be called 9.x-rc1. The -rc1 release is the signal that the time to
merge new features has passed, and that the time to stabilize the next
kernel has begun.
@@ -99,13 +94,15 @@ release is made. In the real world, this kind of perfection is hard to
achieve; there are just too many variables in a project of this size.
There comes a point where delaying the final release just makes the problem
worse; the pile of changes waiting for the next merge window will grow
-larger, creating even more regressions the next time around. So most 5.x
-kernels go out with a handful of known regressions though, hopefully, none
-of them are serious.
+larger, creating even more regressions the next time around. So most kernels
+go out with a handful of known regressions, though, hopefully, none of them
+are serious.
Once a stable release is made, its ongoing maintenance is passed off to the
-"stable team," currently Greg Kroah-Hartman. The stable team will release
-occasional updates to the stable release using the 5.x.y numbering scheme.
+"stable team," currently consists of Greg Kroah-Hartman and Sasha Levin. The
+stable team will release occasional updates to the stable release using the
+9.x.y numbering scheme.
+
To be considered for an update release, a patch must (1) fix a significant
bug, and (2) already be merged into the mainline for the next development
kernel. Kernels will typically receive stable updates for a little more
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index d1a8e5465ed9..2969ca378dbb 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -76,7 +76,7 @@ Don't use commas to avoid using braces:
if (condition)
do_this(), do_that();
-Always uses braces for multiple statements:
+Always use braces for multiple statements:
.. code-block:: c
diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index 910e8fc9e3c8..9a509f1a6873 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -592,8 +592,9 @@ Both Tested-by and Reviewed-by tags, once received on mailing list from tester
or reviewer, should be added by author to the applicable patches when sending
next versions. However if the patch has changed substantially in following
version, these tags might not be applicable anymore and thus should be removed.
-Usually removal of someone's Tested-by or Reviewed-by tags should be mentioned
-in the patch changelog (after the '---' separator).
+Usually removal of someone's Acked-by, Tested-by or Reviewed-by tags should be
+mentioned in the patch changelog with an explanation (after the '---'
+separator).
A Suggested-by: tag indicates that the patch idea is suggested by the person
named and ensures credit to the person for the idea: if we diligently credit
diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
index 155f7107329a..152289f0bed2 100644
--- a/Documentation/rust/quick-start.rst
+++ b/Documentation/rust/quick-start.rst
@@ -39,8 +39,8 @@ of the box, e.g.::
Debian
******
-Debian Testing and Debian Unstable (Sid), outside of the freeze period, provide
-recent Rust releases and thus they should generally work out of the box, e.g.::
+Debian 13 (Trixie), as well as Testing and Debian Unstable (Sid) provide recent
+Rust releases and thus they should generally work out of the box, e.g.::
apt install rustc rust-src bindgen rustfmt rust-clippy
diff --git a/Documentation/security/keys/trusted-encrypted.rst b/Documentation/security/keys/trusted-encrypted.rst
index f4d7e162d5e4..eae6a36b1c9a 100644
--- a/Documentation/security/keys/trusted-encrypted.rst
+++ b/Documentation/security/keys/trusted-encrypted.rst
@@ -10,6 +10,37 @@ of a Trust Source for greater security, while Encrypted Keys can be used on any
system. All user level blobs, are displayed and loaded in hex ASCII for
convenience, and are integrity verified.
+Trusted Keys as Protected key
+=============================
+It is the secure way of keeping the keys in the kernel key-ring as Trusted-Key,
+such that:
+
+- Key-blob, an encrypted key-data, created to be stored, loaded and seen by
+ userspace.
+- Key-data, the plain-key text in the system memory, to be used by
+ kernel space only.
+
+Though key-data is not accessible to the user-space in plain-text, but it is in
+plain-text in system memory, when used in kernel space. Even though kernel-space
+attracts small surface attack, but with compromised kernel or side-channel
+attack accessing the system memory can lead to a chance of the key getting
+compromised/leaked.
+
+In order to protect the key in kernel space, the concept of "protected-keys" is
+introduced which will act as an added layer of protection. The key-data of the
+protected keys is encrypted with Key-Encryption-Key(KEK), and decrypted inside
+the trust source boundary. The plain-key text never available out-side in the
+system memory. Thus, any crypto operation that is to be executed using the
+protected key, can only be done by the trust source, which generated the
+key blob.
+
+Hence, if the protected-key is leaked or compromised, it is of no use to the
+hacker.
+
+Trusted keys as protected keys, with trust source having the capability of
+generating:
+
+- Key-Blob, to be loaded, stored and seen by user-space.
Trust Source
============
@@ -252,7 +283,7 @@ in bytes. Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
Trusted Keys usage: CAAM
------------------------
-Usage::
+Trusted Keys Usage::
keyctl add trusted name "new keylen" ring
keyctl add trusted name "load hex_blob" ring
@@ -262,6 +293,21 @@ Usage::
CAAM-specific format. The key length for new keys is always in bytes.
Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
+Trusted Keys as Protected Keys Usage::
+
+ keyctl add trusted name "new keylen pk [options]" ring
+ keyctl add trusted name "load hex_blob [options]" ring
+ keyctl print keyid
+
+ where, 'pk' is used to direct trust source to generate protected key.
+
+ options:
+ key_enc_algo = For CAAM, supported enc algo are ECB(2), CCM(1).
+
+"keyctl print" returns an ASCII hex copy of the sealed key, which is in a
+CAAM-specific format. The key length for new keys is always in bytes.
+Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
+
Trusted Keys usage: DCP
-----------------------
@@ -343,6 +389,46 @@ Load a trusted key from the saved blob::
f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
e4a8aea2b607ec96931e6f4d4fe563ba
+Create and save a trusted key as protected key named "kmk" of length 32 bytes.
+
+::
+
+ $ keyctl add trusted kmk "new 32 pk key_enc_algo=1" @u
+ 440502848
+
+ $ keyctl show
+ Session Keyring
+ -3 --alswrv 500 500 keyring: _ses
+ 97833714 --alswrv 500 -1 \_ keyring: _uid.500
+ 440502848 --alswrv 500 500 \_ trusted: kmk
+
+ $ keyctl print 440502848
+ 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
+ 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
+ 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
+ a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
+ d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
+ dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
+ f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
+ e4a8aea2b607ec96931e6f4d4fe563ba
+
+ $ keyctl pipe 440502848 > kmk.blob
+
+Load a trusted key from the saved blob::
+
+ $ keyctl add trusted kmk "load `cat kmk.blob` key_enc_algo=1" @u
+ 268728824
+
+ $ keyctl print 268728824
+ 0101000000000000000001005d01b7e3f4a6be5709930f3b70a743cbb42e0cc95e18e915
+ 3f60da455bbf1144ad12e4f92b452f966929f6105fd29ca28e4d4d5a031d068478bacb0b
+ 27351119f822911b0a11ba3d3498ba6a32e50dac7f32894dd890eb9ad578e4e292c83722
+ a52e56a097e6a68b3f56f7a52ece0cdccba1eb62cad7d817f6dc58898b3ac15f36026fec
+ d568bd4a706cb60bb37be6d8f1240661199d640b66fb0fe3b079f97f450b9ef9c22c6d5d
+ dd379f0facd1cd020281dfa3c70ba21a3fa6fc2471dc6d13ecf8298b946f65345faa5ef0
+ f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
+ e4a8aea2b607ec96931e6f4d4fe563ba
+
Reseal (TPM specific) a trusted key under new PCR values::
$ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`"
diff --git a/Documentation/sound/codecs/cs35l56.rst b/Documentation/sound/codecs/cs35l56.rst
index 57d1964453e1..d5363b08f515 100644
--- a/Documentation/sound/codecs/cs35l56.rst
+++ b/Documentation/sound/codecs/cs35l56.rst
@@ -105,10 +105,10 @@ In this example the SSID is 10280c63.
The format of the firmware file names is:
-SoundWire (except CS35L56 Rev B0):
+SoundWire:
cs35lxx-b0-dsp1-misc-SSID[-spkidX]-l?u?
-SoundWire CS35L56 Rev B0:
+SoundWire CS35L56 Rev B0 firmware released before kernel version 6.16:
cs35lxx-b0-dsp1-misc-SSID[-spkidX]-ampN
Non-SoundWire (HDA and I2S):
@@ -127,9 +127,8 @@ Where:
* spkidX is an optional part, used for laptops that have firmware
configurations for different makes and models of internal speakers.
-The CS35L56 Rev B0 continues to use the old filename scheme because a
-large number of firmware files have already been published with these
-names.
+Early firmware for CS35L56 Rev B0 used the ALSA prefix (ampN) as the
+filename qualifier. Support for the l?u? qualifier was added in kernel 6.16.
Sound Open Firmware and ALSA topology files
-------------------------------------------
diff --git a/Documentation/sphinx/kernel_abi.py b/Documentation/sphinx/kernel_abi.py
index 4c4375201b9e..5667f207d175 100644
--- a/Documentation/sphinx/kernel_abi.py
+++ b/Documentation/sphinx/kernel_abi.py
@@ -14,7 +14,7 @@
:license: GPL Version 2, June 1991 see Linux/COPYING for details.
The ``kernel-abi`` (:py:class:`KernelCmd`) directive calls the
- scripts/get_abi.py script to parse the Kernel ABI files.
+ AbiParser class to parse the Kernel ABI files.
Overview of directive's argument and options.
@@ -43,9 +43,9 @@ from sphinx.util.docutils import switch_source_input
from sphinx.util import logging
srctree = os.path.abspath(os.environ["srctree"])
-sys.path.insert(0, os.path.join(srctree, "scripts/lib/abi"))
+sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
-from abi_parser import AbiParser
+from abi.abi_parser import AbiParser
__version__ = "1.0"
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py
index aaac76892ceb..bdc0fef5c87f 100644
--- a/Documentation/sphinx/kernel_feat.py
+++ b/Documentation/sphinx/kernel_feat.py
@@ -13,7 +13,7 @@
:license: GPL Version 2, June 1991 see Linux/COPYING for details.
The ``kernel-feat`` (:py:class:`KernelFeat`) directive calls the
- scripts/get_feat.pl script to parse the Kernel ABI files.
+ tools/docs/get_feat.pl script to parse the Kernel ABI files.
Overview of directive's argument and options.
@@ -34,7 +34,6 @@
import codecs
import os
import re
-import subprocess
import sys
from docutils import nodes, statemachine
@@ -42,6 +41,11 @@ from docutils.statemachine import ViewList
from docutils.parsers.rst import directives, Directive
from sphinx.util.docutils import switch_source_input
+srctree = os.path.abspath(os.environ["srctree"])
+sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
+
+from feat.parse_features import ParseFeature # pylint: disable=C0413
+
def ErrorString(exc): # Shamelessly stolen from docutils
return f'{exc.__class__.__name}: {exc}'
@@ -84,18 +88,16 @@ class KernelFeat(Directive):
srctree = os.path.abspath(os.environ["srctree"])
- args = [
- os.path.join(srctree, 'scripts/get_feat.pl'),
- 'rest',
- '--enable-fname',
- '--dir',
- os.path.join(srctree, 'Documentation', self.arguments[0]),
- ]
+ feature_dir = os.path.join(srctree, 'Documentation', self.arguments[0])
- if len(self.arguments) > 1:
- args.extend(['--arch', self.arguments[1]])
+ feat = ParseFeature(feature_dir, False, True)
+ feat.parse()
- lines = subprocess.check_output(args, cwd=os.path.dirname(doc.current_source)).decode('utf-8')
+ if len(self.arguments) > 1:
+ arch = self.arguments[1]
+ lines = feat.output_arch_table(arch)
+ else:
+ lines = feat.output_matrix()
line_regex = re.compile(r"^\.\. FILE (\S+)$")
diff --git a/Documentation/sphinx/kernel_include.py b/Documentation/sphinx/kernel_include.py
index 75e139287d50..626762ff6af3 100755
--- a/Documentation/sphinx/kernel_include.py
+++ b/Documentation/sphinx/kernel_include.py
@@ -97,9 +97,9 @@ from docutils.parsers.rst.directives.body import CodeBlock, NumberLines
from sphinx.util import logging
srctree = os.path.abspath(os.environ["srctree"])
-sys.path.insert(0, os.path.join(srctree, "tools/docs/lib"))
+sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
-from parse_data_structs import ParseDataStructs
+from kdoc.parse_data_structs import ParseDataStructs
__version__ = "1.0"
logger = logging.getLogger(__name__)
diff --git a/Documentation/sphinx/kerneldoc-preamble.sty b/Documentation/sphinx/kerneldoc-preamble.sty
index 5d68395539fe..16d9ff46fdf6 100644
--- a/Documentation/sphinx/kerneldoc-preamble.sty
+++ b/Documentation/sphinx/kerneldoc-preamble.sty
@@ -220,7 +220,7 @@
If you want them, please install non-variable ``Noto Sans CJK''
font families along with the texlive-xecjk package by following
instructions from
- \sphinxcode{./scripts/sphinx-pre-install}.
+ \sphinxcode{./tools/docs/sphinx-pre-install}.
Having optional non-variable ``Noto Serif CJK'' font families will
improve the looks of those translations.
\end{sphinxadmonition}}
diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py
index 2586b4d4e494..d8cdf068ef35 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -42,10 +42,10 @@ from sphinx.util import logging
from pprint import pformat
srctree = os.path.abspath(os.environ["srctree"])
-sys.path.insert(0, os.path.join(srctree, "scripts/lib/kdoc"))
+sys.path.insert(0, os.path.join(srctree, "tools/lib/python"))
-from kdoc_files import KernelFiles
-from kdoc_output import RestFormat
+from kdoc.kdoc_files import KernelFiles
+from kdoc.kdoc_output import RestFormat
__version__ = '1.0'
kfiles = None
diff --git a/Documentation/sphinx/load_config.py b/Documentation/sphinx/load_config.py
deleted file mode 100644
index 1afb0c97f06b..000000000000
--- a/Documentation/sphinx/load_config.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-# SPDX-License-Identifier: GPL-2.0
-# pylint: disable=R0903, C0330, R0914, R0912, E0401
-
-import os
-import sys
-from sphinx.util.osutil import fs_encoding
-
-# ------------------------------------------------------------------------------
-def loadConfig(namespace):
-# ------------------------------------------------------------------------------
-
- """Load an additional configuration file into *namespace*.
-
- The name of the configuration file is taken from the environment
- ``SPHINX_CONF``. The external configuration file extends (or overwrites) the
- configuration values from the origin ``conf.py``. With this you are able to
- maintain *build themes*. """
-
- config_file = os.environ.get("SPHINX_CONF", None)
- if (config_file is not None
- and os.path.normpath(namespace["__file__"]) != os.path.normpath(config_file) ):
- config_file = os.path.abspath(config_file)
-
- # Let's avoid one conf.py file just due to latex_documents
- start = config_file.find('Documentation/')
- if start >= 0:
- start = config_file.find('/', start + 1)
-
- end = config_file.rfind('/')
- if start >= 0 and end > 0:
- dir = config_file[start + 1:end]
-
- print("source directory: %s" % dir)
- new_latex_docs = []
- latex_documents = namespace['latex_documents']
-
- for l in latex_documents:
- if l[0].find(dir + '/') == 0:
- has = True
- fn = l[0][len(dir) + 1:]
- new_latex_docs.append((fn, l[1], l[2], l[3], l[4]))
- break
-
- namespace['latex_documents'] = new_latex_docs
-
- # If there is an extra conf.py file, load it
- if os.path.isfile(config_file):
- sys.stdout.write("load additional sphinx-config: %s\n" % config_file)
- config = namespace.copy()
- config['__file__'] = config_file
- with open(config_file, 'rb') as f:
- code = compile(f.read(), fs_encoding, 'exec')
- exec(code, config)
- del config['__file__']
- namespace.update(config)
- else:
- config = namespace.copy()
- config['tags'].add("subproject")
- namespace.update(config)
diff --git a/Documentation/sphinx/parallel-wrapper.sh b/Documentation/sphinx/parallel-wrapper.sh
deleted file mode 100644
index e54c44ce117d..000000000000
--- a/Documentation/sphinx/parallel-wrapper.sh
+++ /dev/null
@@ -1,33 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0+
-#
-# Figure out if we should follow a specific parallelism from the make
-# environment (as exported by scripts/jobserver-exec), or fall back to
-# the "auto" parallelism when "-jN" is not specified at the top-level
-# "make" invocation.
-
-sphinx="$1"
-shift || true
-
-parallel="$PARALLELISM"
-if [ -z "$parallel" ] ; then
- # If no parallelism is specified at the top-level make, then
- # fall back to the expected "-jauto" mode that the "htmldocs"
- # target has had.
- auto=$(perl -e 'open IN,"'"$sphinx"' --version 2>&1 |";
- while (<IN>) {
- if (m/([\d\.]+)/) {
- print "auto" if ($1 >= "1.7")
- }
- }
- close IN')
- if [ -n "$auto" ] ; then
- parallel="$auto"
- fi
-fi
-# Only if some parallelism has been determined do we add the -jN option.
-if [ -n "$parallel" ] ; then
- parallel="-j$parallel"
-fi
-
-exec "$sphinx" $parallel "$@"
diff --git a/Documentation/tools/rtla/common_appendix.rst b/Documentation/tools/rtla/common_appendix.txt
index 53cae7537537..53cae7537537 100644
--- a/Documentation/tools/rtla/common_appendix.rst
+++ b/Documentation/tools/rtla/common_appendix.txt
diff --git a/Documentation/tools/rtla/common_hist_options.rst b/Documentation/tools/rtla/common_hist_options.txt
index df53ff835bfb..df53ff835bfb 100644
--- a/Documentation/tools/rtla/common_hist_options.rst
+++ b/Documentation/tools/rtla/common_hist_options.txt
diff --git a/Documentation/tools/rtla/common_options.rst b/Documentation/tools/rtla/common_options.txt
index 77ef35d3f831..1c4f3e663cf5 100644
--- a/Documentation/tools/rtla/common_options.rst
+++ b/Documentation/tools/rtla/common_options.txt
@@ -1,11 +1,15 @@
**-c**, **--cpus** *cpu-list*
- Set the osnoise tracer to run the sample threads in the cpu-list.
+ Set the |tool| tracer to run the sample threads in the cpu-list.
+
+ By default, the |tool| tracer runs the sample threads on all CPUs.
**-H**, **--house-keeping** *cpu-list*
Run rtla control threads only on the given cpu-list.
+ If omitted, rtla will attempt to auto-migrate its main thread to any CPU that is not running any workload threads.
+
**-d**, **--duration** *time[s|m|h|d]*
Set the duration of the session.
@@ -35,17 +39,21 @@
**-P**, **--priority** *o:prio|r:prio|f:prio|d:runtime:period*
- Set scheduling parameters to the osnoise tracer threads, the format to set the priority are:
+ Set scheduling parameters to the |tool| tracer threads, the format to set the priority are:
- *o:prio* - use SCHED_OTHER with *prio*;
- *r:prio* - use SCHED_RR with *prio*;
- *f:prio* - use SCHED_FIFO with *prio*;
- *d:runtime[us|ms|s]:period[us|ms|s]* - use SCHED_DEADLINE with *runtime* and *period* in nanoseconds.
+ If not set, tracer threads keep their default priority. For rtla user threads, it is set to SCHED_FIFO with priority 95. For kernel threads, see *osnoise* and *timerlat* tracer documentation for the running kernel version.
+
**-C**, **--cgroup**\[*=cgroup*]
Set a *cgroup* to the tracer's threads. If the **-C** option is passed without arguments, the tracer's thread will inherit **rtla**'s *cgroup*. Otherwise, the threads will be placed on the *cgroup* passed to the option.
+ If not set, the behavior differs between workload types. User workloads created by rtla will inherit rtla's cgroup. Kernel workloads are assigned the root cgroup.
+
**--warm-up** *s*
After starting the workload, let it run for *s* seconds before starting collecting the data, allowing the system to warm-up. Statistical data generated during warm-up is discarded.
@@ -53,6 +61,8 @@
**--trace-buffer-size** *kB*
Set the per-cpu trace buffer size in kB for the tracing output.
+ If not set, the default tracefs buffer size is used.
+
**--on-threshold** *action*
Defines an action to be executed when tracing is stopped on a latency threshold
@@ -67,7 +77,7 @@
- *trace[,file=<filename>]*
Saves trace output, optionally taking a filename. Alternative to -t/--trace.
- Note that nlike -t/--trace, specifying this multiple times will result in
+ Note that unlike -t/--trace, specifying this multiple times will result in
the trace being saved multiple times.
- *signal,num=<sig>,pid=<pid>*
diff --git a/Documentation/tools/rtla/common_osnoise_description.rst b/Documentation/tools/rtla/common_osnoise_description.txt
index d5d61615b967..d5d61615b967 100644
--- a/Documentation/tools/rtla/common_osnoise_description.rst
+++ b/Documentation/tools/rtla/common_osnoise_description.txt
diff --git a/Documentation/tools/rtla/common_osnoise_options.rst b/Documentation/tools/rtla/common_osnoise_options.txt
index bd3c4f499193..bd3c4f499193 100644
--- a/Documentation/tools/rtla/common_osnoise_options.rst
+++ b/Documentation/tools/rtla/common_osnoise_options.txt
diff --git a/Documentation/tools/rtla/common_timerlat_aa.rst b/Documentation/tools/rtla/common_timerlat_aa.txt
index 077029e6b289..077029e6b289 100644
--- a/Documentation/tools/rtla/common_timerlat_aa.rst
+++ b/Documentation/tools/rtla/common_timerlat_aa.txt
diff --git a/Documentation/tools/rtla/common_timerlat_description.rst b/Documentation/tools/rtla/common_timerlat_description.txt
index 49fcae3ffdec..49fcae3ffdec 100644
--- a/Documentation/tools/rtla/common_timerlat_description.rst
+++ b/Documentation/tools/rtla/common_timerlat_description.txt
diff --git a/Documentation/tools/rtla/common_timerlat_options.rst b/Documentation/tools/rtla/common_timerlat_options.txt
index 1f5d024b53aa..33070b264cae 100644
--- a/Documentation/tools/rtla/common_timerlat_options.rst
+++ b/Documentation/tools/rtla/common_timerlat_options.txt
@@ -13,7 +13,7 @@
Set the automatic trace mode. This mode sets some commonly used options
while debugging the system. It is equivalent to use **-T** *us* **-s** *us*
**-t**. By default, *timerlat* tracer uses FIFO:95 for *timerlat* threads,
- thus equilavent to **-P** *f:95*.
+ thus equivalent to **-P** *f:95*.
**-p**, **--period** *us*
@@ -56,7 +56,7 @@
**-u**, **--user-threads**
Set timerlat to run without a workload, and then dispatches user-space workloads
- to wait on the timerlat_fd. Once the workload is awakes, it goes to sleep again
+ to wait on the timerlat_fd. Once the workload is awakened, it goes to sleep again
adding so the measurement for the kernel-to-user and user-to-kernel to the tracer
output. **--user-threads** will be used unless the user specify **-k**.
diff --git a/Documentation/tools/rtla/common_top_options.rst b/Documentation/tools/rtla/common_top_options.txt
index f48878938f84..f48878938f84 100644
--- a/Documentation/tools/rtla/common_top_options.rst
+++ b/Documentation/tools/rtla/common_top_options.txt
diff --git a/Documentation/tools/rtla/rtla-hwnoise.rst b/Documentation/tools/rtla/rtla-hwnoise.rst
index 3a7163c02ac8..26512b15fe7b 100644
--- a/Documentation/tools/rtla/rtla-hwnoise.rst
+++ b/Documentation/tools/rtla/rtla-hwnoise.rst
@@ -29,11 +29,11 @@ collection of the tracer output.
OPTIONS
=======
-.. include:: common_osnoise_options.rst
+.. include:: common_osnoise_options.txt
-.. include:: common_top_options.rst
+.. include:: common_top_options.txt
-.. include:: common_options.rst
+.. include:: common_options.txt
EXAMPLE
=======
@@ -106,4 +106,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-osnoise-hist.rst b/Documentation/tools/rtla/rtla-osnoise-hist.rst
index 1fc60ef26106..007521c865d9 100644
--- a/Documentation/tools/rtla/rtla-osnoise-hist.rst
+++ b/Documentation/tools/rtla/rtla-osnoise-hist.rst
@@ -15,7 +15,7 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_osnoise_description.rst
+.. include:: common_osnoise_description.txt
The **rtla osnoise hist** tool collects all **osnoise:sample_threshold**
occurrence in a histogram, displaying the results in a user-friendly way.
@@ -24,11 +24,11 @@ collection of the tracer output.
OPTIONS
=======
-.. include:: common_osnoise_options.rst
+.. include:: common_osnoise_options.txt
-.. include:: common_hist_options.rst
+.. include:: common_hist_options.txt
-.. include:: common_options.rst
+.. include:: common_options.txt
EXAMPLE
=======
@@ -65,4 +65,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-osnoise-top.rst b/Documentation/tools/rtla/rtla-osnoise-top.rst
index b1cbd7bcd4ae..6ccadae38945 100644
--- a/Documentation/tools/rtla/rtla-osnoise-top.rst
+++ b/Documentation/tools/rtla/rtla-osnoise-top.rst
@@ -15,7 +15,7 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_osnoise_description.rst
+.. include:: common_osnoise_description.txt
**rtla osnoise top** collects the periodic summary from the *osnoise* tracer,
including the counters of the occurrence of the interference source,
@@ -26,11 +26,11 @@ collection of the tracer output.
OPTIONS
=======
-.. include:: common_osnoise_options.rst
+.. include:: common_osnoise_options.txt
-.. include:: common_top_options.rst
+.. include:: common_top_options.txt
-.. include:: common_options.rst
+.. include:: common_options.txt
EXAMPLE
=======
@@ -60,4 +60,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-osnoise.rst b/Documentation/tools/rtla/rtla-osnoise.rst
index c129b206ce34..540d2bf6c152 100644
--- a/Documentation/tools/rtla/rtla-osnoise.rst
+++ b/Documentation/tools/rtla/rtla-osnoise.rst
@@ -14,7 +14,7 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_osnoise_description.rst
+.. include:: common_osnoise_description.txt
The *osnoise* tracer outputs information in two ways. It periodically prints
a summary of the noise of the operating system, including the counters of
@@ -56,4 +56,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-timerlat-hist.rst b/Documentation/tools/rtla/rtla-timerlat-hist.rst
index 4923a362129b..f56fe546411b 100644
--- a/Documentation/tools/rtla/rtla-timerlat-hist.rst
+++ b/Documentation/tools/rtla/rtla-timerlat-hist.rst
@@ -16,7 +16,7 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_timerlat_description.rst
+.. include:: common_timerlat_description.txt
The **rtla timerlat hist** displays a histogram of each tracer event
occurrence. This tool uses the periodic information, and the
@@ -25,13 +25,13 @@ occurrence. This tool uses the periodic information, and the
OPTIONS
=======
-.. include:: common_timerlat_options.rst
+.. include:: common_timerlat_options.txt
-.. include:: common_hist_options.rst
+.. include:: common_hist_options.txt
-.. include:: common_options.rst
+.. include:: common_options.txt
-.. include:: common_timerlat_aa.rst
+.. include:: common_timerlat_aa.txt
EXAMPLE
=======
@@ -110,4 +110,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-timerlat-top.rst b/Documentation/tools/rtla/rtla-timerlat-top.rst
index 50968cdd2095..72d85e36c193 100644
--- a/Documentation/tools/rtla/rtla-timerlat-top.rst
+++ b/Documentation/tools/rtla/rtla-timerlat-top.rst
@@ -16,23 +16,23 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_timerlat_description.rst
+.. include:: common_timerlat_description.txt
The **rtla timerlat top** displays a summary of the periodic output
from the *timerlat* tracer. It also provides information for each
operating system noise via the **osnoise:** tracepoints that can be
-seem with the option **-T**.
+seen with the option **-T**.
OPTIONS
=======
-.. include:: common_timerlat_options.rst
+.. include:: common_timerlat_options.txt
-.. include:: common_top_options.rst
+.. include:: common_top_options.txt
-.. include:: common_options.rst
+.. include:: common_options.txt
-.. include:: common_timerlat_aa.rst
+.. include:: common_timerlat_aa.txt
**--aa-only** *us*
@@ -133,4 +133,4 @@ AUTHOR
------
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla-timerlat.rst b/Documentation/tools/rtla/rtla-timerlat.rst
index 20e2d259467f..ce9f57e038c3 100644
--- a/Documentation/tools/rtla/rtla-timerlat.rst
+++ b/Documentation/tools/rtla/rtla-timerlat.rst
@@ -14,7 +14,7 @@ SYNOPSIS
DESCRIPTION
===========
-.. include:: common_timerlat_description.rst
+.. include:: common_timerlat_description.txt
The **rtla timerlat top** mode displays a summary of the periodic output
from the *timerlat* tracer. The **rtla timerlat hist** mode displays
@@ -51,4 +51,4 @@ AUTHOR
======
Written by Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/tools/rtla/rtla.rst b/Documentation/tools/rtla/rtla.rst
index fc0d233efcd5..2a5fb7004ad4 100644
--- a/Documentation/tools/rtla/rtla.rst
+++ b/Documentation/tools/rtla/rtla.rst
@@ -45,4 +45,4 @@ AUTHOR
======
Daniel Bristot de Oliveira <bristot@kernel.org>
-.. include:: common_appendix.rst
+.. include:: common_appendix.txt
diff --git a/Documentation/trace/timerlat-tracer.rst b/Documentation/trace/timerlat-tracer.rst
index 53a56823e903..68d429d454a5 100644
--- a/Documentation/trace/timerlat-tracer.rst
+++ b/Documentation/trace/timerlat-tracer.rst
@@ -43,12 +43,12 @@ It is possible to follow the trace by reading the trace file::
<...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
-The tracer creates a per-cpu kernel thread with real-time priority that
-prints two lines at every activation. The first is the *timer latency*
-observed at the *hardirq* context before the activation of the thread.
-The second is the *timer latency* observed by the thread. The ACTIVATION
-ID field serves to relate the *irq* execution to its respective *thread*
-execution.
+The tracer creates a per-cpu kernel thread with real-time priority
+SCHED_FIFO:95 that prints two lines at every activation. The first is
+the *timer latency* observed at the *hardirq* context before the activation
+of the thread. The second is the *timer latency* observed by the thread.
+The ACTIVATION ID field serves to relate the *irq* execution to its
+respective *thread* execution.
The *irq*/*thread* splitting is important to clarify in which context
the unexpected high value is coming from. The *irq* context can be
diff --git a/Documentation/translations/it_IT/doc-guide/parse-headers.rst b/Documentation/translations/it_IT/doc-guide/parse-headers.rst
index 026a23e49767..b0caa40fe1e9 100644
--- a/Documentation/translations/it_IT/doc-guide/parse-headers.rst
+++ b/Documentation/translations/it_IT/doc-guide/parse-headers.rst
@@ -13,28 +13,28 @@ dello spazio utente ha ulteriori vantaggi: Sphinx genererà dei messaggi
d'avviso se un simbolo non viene trovato nella documentazione. Questo permette
di mantenere allineate la documentazione della uAPI (API spazio utente)
con le modifiche del kernel.
-Il programma :ref:`parse_headers.pl <it_parse_headers>` genera questi riferimenti.
+Il programma :ref:`parse_headers.py <it_parse_headers>` genera questi riferimenti.
Esso dev'essere invocato attraverso un Makefile, mentre si genera la
documentazione. Per avere un esempio su come utilizzarlo all'interno del kernel
consultate ``Documentation/userspace-api/media/Makefile``.
.. _it_parse_headers:
-parse_headers.pl
+parse_headers.py
^^^^^^^^^^^^^^^^
NOME
****
-parse_headers.pl - analizza i file C al fine di identificare funzioni,
+parse_headers.py - analizza i file C al fine di identificare funzioni,
strutture, enumerati e definizioni, e creare riferimenti per Sphinx
SINTASSI
********
-\ **parse_headers.pl**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
+\ **parse_headers.py**\ [<options>] <C_FILE> <OUT_FILE> [<EXCEPTIONS_FILE>]
Dove <options> può essere: --debug, --usage o --help.
diff --git a/Documentation/translations/it_IT/doc-guide/sphinx.rst b/Documentation/translations/it_IT/doc-guide/sphinx.rst
index 1f513bc33618..a5c5d935febf 100644
--- a/Documentation/translations/it_IT/doc-guide/sphinx.rst
+++ b/Documentation/translations/it_IT/doc-guide/sphinx.rst
@@ -109,7 +109,7 @@ Sphinx. Se lo script riesce a riconoscere la vostra distribuzione, allora
sarà in grado di darvi dei suggerimenti su come procedere per completare
l'installazione::
- $ ./scripts/sphinx-pre-install
+ $ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -119,7 +119,7 @@ l'installazione::
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
- Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
+ Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
L'impostazione predefinita prevede il controllo dei requisiti per la generazione
di documenti html e PDF, includendo anche il supporto per le immagini, le
diff --git a/Documentation/translations/ja_JP/SubmittingPatches b/Documentation/translations/ja_JP/SubmittingPatches
index 5334db471744..b950347b5993 100644
--- a/Documentation/translations/ja_JP/SubmittingPatches
+++ b/Documentation/translations/ja_JP/SubmittingPatches
@@ -132,6 +132,25 @@ http://savannah.nongnu.org/projects/quilt
platform_set_drvdata(), but left the variable "dev" unused,
delete it.
+特定のコミットで導入された不具合を修正する場合(例えば ``git bisect`` で原因となった
+コミットを特定したときなど)は、コミットの SHA-1 の先頭12文字と1行の要約を添えた
+「Fixes:」タグを付けてください。この行は75文字を超えても構いませんが、途中で
+改行せず、必ず1行で記述してください。
+例:
+ Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
+
+以下の git の設定を使うと、git log や git show で上記形式を出力するための
+専用の出力形式を追加できます::
+
+ [core]
+ abbrev = 12
+ [pretty]
+ fixes = Fixes: %h (\"%s\")
+
+使用例::
+
+ $ git log -1 --pretty=fixes 54a4f0239f2e
+ Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed")
3) パッチの分割
@@ -409,7 +428,7 @@ Acked-by: が必ずしもパッチ全体の承認を示しているわけでは
このタグはパッチに関心があると思われる人達がそのパッチの議論に含まれていたこと
を明文化します。
-14) Reported-by:, Tested-by:, Reviewed-by: および Suggested-by: の利用
+14) Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: および Fixes: の利用
他の誰かによって報告された問題を修正するパッチであれば、問題報告者という寄与を
クレジットするために、Reported-by: タグを追加することを検討してください。
@@ -465,6 +484,13 @@ Suggested-by: タグは、パッチのアイデアがその人からの提案に
クレジットしていけば、望むらくはその人たちが将来別の機会に再度力を貸す気に
なってくれるかもしれません。
+Fixes: タグは、そのパッチが以前のコミットにあった問題を修正することを示します。
+これは、バグがどこで発生したかを特定しやすくし、バグ修正のレビューに役立ちます。
+また、このタグはstableカーネルチームが、あなたの修正をどのstableカーネル
+バージョンに適用すべきか判断する手助けにもなります。パッチによって修正された
+バグを示すには、この方法が推奨されます。前述の、「2) パッチに対する説明」の
+セクションを参照してください。
+
15) 標準的なパッチのフォーマット
標準的なパッチのサブジェクトは以下のとおりです。
diff --git a/Documentation/translations/zh_CN/admin-guide/README.rst b/Documentation/translations/zh_CN/admin-guide/README.rst
index 82e628b77efd..7c2ffe7e87c7 100644
--- a/Documentation/translations/zh_CN/admin-guide/README.rst
+++ b/Documentation/translations/zh_CN/admin-guide/README.rst
@@ -288,4 +288,4 @@ Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
更多用GDB调试内核的信息,请参阅:
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
-和 Documentation/dev-tools/kgdb.rst 。
+和 Documentation/process/debugging/kgdb.rst 。
diff --git a/Documentation/translations/zh_CN/block/blk-mq.rst b/Documentation/translations/zh_CN/block/blk-mq.rst
new file mode 100644
index 000000000000..ccc08f76ff97
--- /dev/null
+++ b/Documentation/translations/zh_CN/block/blk-mq.rst
@@ -0,0 +1,130 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/block/blk-mq.rst
+
+:翻译:
+
+ 柯子杰 kezijie <kezijie@leap-io-kernel.com>
+
+:校译:
+
+
+
+================================================
+多队列块设备 I/O 排队机制 (blk-mq)
+================================================
+
+多队列块设备 I/O 排队机制提供了一组 API,使高速存储设备能够同时在多个队列中
+处理并发的 I/O 请求并将其提交到块设备,从而实现极高的每秒输入/输出操作次数
+(IOPS),充分发挥现代存储设备的并行能力。
+
+介绍
+====
+
+背景
+----
+
+磁盘从 Linux 内核开发初期就已成为事实上的标准。块 I/O 子系统的目标是尽可能
+为此类设备提供最佳性能,因为它们在进行随机访问时代价极高,性能瓶颈主要在机械
+运动部件上,其速度远低于存储栈中其他任何层。其中一个软件优化例子是根据硬盘磁
+头当前的位置重新排序读/写请求。
+
+然而,随着固态硬盘和非易失性存储的发展,它们没有机械部件,也不存在随机访问代
+码,并能够进行高速并行访问,存储栈的瓶颈从存储设备转移到了操作系统。为了充分
+利用这些设备设计中的并行性,引入了多队列机制。
+
+原来的设计只有一个队列来存储块设备 I/O 请求,并且只使用一个锁。由于缓存中的
+脏数据和多处理器共享单锁的瓶颈,这种设计在 SMP 系统中扩展性不佳。当不同进程
+(或同一进程在不同 CPU 上)同时执行块设备 I/O 时,该单队列模型还会出现严重
+的拥塞问题。为了解决这些问题,blk-mq API 引入了多个队列,每个队列在本地 CPU
+上拥有独立的入口点,从而消除了对全局锁的需求。关于其具体工作机制的更深入说明,
+请参见下一节( `工作原理`_ )。
+
+工作原理
+--------
+
+当用户空间执行对块设备的 I/O(例如读写文件)时,blk-mq 便会介入:它将存储和
+管理发送到块设备的 I/O 请求,充当用户空间(文件系统,如果存在的话)与块设备驱
+动之间的中间层。
+
+blk-mq 由两组队列组成:软件暂存队列和硬件派发队列。当请求到达块层时,它会尝
+试最短路径:直接发送到硬件队列。然而,有两种情况下可能不会这样做:如果该层有
+IO 调度器或者是希望合并请求。在这两种情况下,请求将被发送到软件队列。
+
+随后,在软件队列中的请求被处理后,请求会被放置到硬件队列。硬件队列是第二阶段
+的队列,硬件可以直接访问并处理这些请求。然而,如果硬件没有足够的资源来接受更
+多请求,blk-mq 会将请求放置在临时队列中,待硬件资源充足时再发送。
+
+软件暂存队列
+~~~~~~~~~~~~
+
+在这些请求未直接发送到驱动时,块设备 I/O 子系统会将请求添加到软件暂存队列中
+(由 struct blk_mq_ctx 表示)。一个请求可能包含一个或多个 BIO。它们通过 struct bio
+数据结构到达块层。块层随后会基于这些 BIO 构建新的结构体 struct request,用于
+与设备驱动通信。每个队列都有自己的锁,队列数量由每个 CPU 和每个 node 为基础
+来决定。
+
+暂存队列可用于合并相邻扇区的请求。例如,对扇区3-6、6-7、7-9的请求可以合并
+为对扇区3-9的一个请求。即便 SSD 或 NVM 的随机访问和顺序访问响应时间相同,
+合并顺序访问的请求仍可减少单独请求的数量。这种合并请求的技术称为 plugging。
+
+此外,I/O 调度器还可以对请求进行重新排序以确保系统资源的公平性(例如防止某
+个应用出现“饥饿”现象)或是提高 I/O 性能。
+
+I/O 调度器
+^^^^^^^^^^
+
+块层实现了多种调度器,每种调度器都遵循一定启发式规则以提高 I/O 性能。它们是
+“可插拔”的(plug and play),可在运行时通过 sysfs 选择。你可以在这里阅读更
+多关于 Linux IO 调度器知识 `here
+<https://www.kernel.org/doc/html/latest/block/index.html>`_。调度只发
+生在同一队列内的请求之间,因此无法合并不同队列的请求,否则会造成缓存冲突并需
+要为每个队列加锁。调度后,请求即可发送到硬件。可能选择的调度器之一是 NONE 调
+度器,这是最直接的调度器:它只将请求放到进程所在的软件队列,不进行重新排序。
+当设备开始处理硬件队列中的请求时(运行硬件队列),映射到该硬件队列的软件队列
+会按映射顺序依次清空。
+
+硬件派发队列
+~~~~~~~~~~~~~
+
+硬件队列(由 struct blk_mq_hw_ctx 表示)是设备驱动用来映射设备提交队列
+(或设备 DMA 环缓存)的结构体,它是块层提交路径在底层设备驱动接管请求之前的
+最后一个阶段。运行此队列时,块层会从相关软件队列中取出请求,并尝试派发到硬件。
+
+如果请求无法直接发送到硬件,它们会被加入到请求的链表(``hctx->dispatch``) 中。
+随后,当块层下次运行该队列时,会优先发送位于 ``dispatch`` 链表中的请求,
+以确保那些最早准备好发送的请求能够得到公平调度。硬件队列的数量取决于硬件及
+其设备驱动所支持的硬件上下文数,但不会超过系统的CPU核心数。在这个阶段不
+会发生重新排序,每个软件队列都有一组硬件队列来用于提交请求。
+
+.. note::
+
+ 块层和设备协议都不保证请求完成顺序。此问题需由更高层处理,例如文件系统。
+
+基于标识的完成机制
+~~~~~~~~~~~~~~~~~~~
+
+为了指示哪一个请求已经完成,每个请求都会被分配一个整数标识,该标识的取值范围
+是从0到分发队列的大小。这个标识由块层生成,并在之后由设备驱动使用,从而避
+免了为每个请求再单独创建冗余的标识符。当请求在驱动中完成时,驱动会将该标识返
+回给块层,以通知该请求已完成。这样,块层就无需再进行线性搜索来确定是哪一个
+I/O 请求完成了。
+
+更多阅读
+--------
+
+- `Linux 块 I/O:多队列 SSD 并发访问简介 <http://kernel.dk/blk-mq.pdf>`_
+
+- `NOOP 调度器 <https://en.wikipedia.org/wiki/Noop_scheduler>`_
+
+- `Null 块设备驱动程序 <https://www.kernel.org/doc/html/latest/block/null_blk.html>`_
+
+源代码
+======
+
+该API在以下内核代码中:
+
+include/linux/blk-mq.h
+
+block/blk-mq.c \ No newline at end of file
diff --git a/Documentation/translations/zh_CN/block/data-integrity.rst b/Documentation/translations/zh_CN/block/data-integrity.rst
new file mode 100644
index 000000000000..b31aa9ef8954
--- /dev/null
+++ b/Documentation/translations/zh_CN/block/data-integrity.rst
@@ -0,0 +1,192 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/block/data-integrity.rst
+
+:翻译:
+
+ 柯子杰 kezijie <kezijie@leap-io-kernel.com>
+
+:校译:
+
+==========
+数据完整性
+==========
+
+1. 引言
+=======
+
+现代文件系统对数据和元数据都进行了校验和保护以防止数据损坏。然而,这种损坏的
+检测是在读取时才进行,这可能发生在数据写入数月之后。到那时,应用程序尝试写入
+的原始数据很可能已经丢失。
+
+解决方案是确保磁盘实际存储的内容就是应用程序想存储的。SCSI 协议族(如 SBC
+数据完整性字段、SCC 保护提案)以及 SATA/T13(外部路径保护)最近新增的功能,
+通过在 I/O 中附加完整性元数据的方式,试图解决这一问题。完整性元数据(在
+SCSI 术语中称为保护信息)包括每个扇区的校验和,以及一个递增计数器,用于确保
+各扇区按正确顺序被写入盘。在某些保护方案中,还能保证 I/O 写入磁盘的正确位置。
+
+当前的存储控制器和设备实现了多种保护措施,例如校验和和数据清理。但这些技术通
+常只在各自的独立域内工作,或最多仅在 I/O 路径的相邻节点之间发挥作用。DIF 及
+其它数据完整性拓展有意思的点在于保护格式定义明确,I/O 路径上的每个节点都可以
+验证 I/O 的完整性,如检测到损坏可直接拒绝。这不仅可以防止数据损坏,还能够隔
+离故障点。
+
+2. 数据完整性拓展
+=================
+
+如上所述,这些协议扩展只保护控制器与存储设备之间的路径。然而,许多控制器实际
+上允许操作系统与完整性元数据(IMD)交互。我们一直与多家 FC/SAS HBA 厂商合作,
+使保护信息能够在其控制器与操作系统之间传输。
+
+SCSI 数据完整性字段通过在每个扇区后附加8字节的保护信息来实现。数据 + 完整
+性元数据存储在磁盘的520字节扇区中。数据 + IMD 在控制器与目标设备之间传输
+时是交错组合在一起的。T13 提案的方式类似。
+
+由于操作系统处理520字节(甚至 4104 字节)扇区非常不便,我们联系了多家 HBA
+厂商,并鼓励它们分离数据与完整性元数据的 scatter-gather lists。
+
+控制器在写入时会将数据缓冲区和完整性元数据缓冲区的数据交错在一起,并在读取时
+会拆分它们。这样,Linux 就能直接通过 DMA 将数据缓冲区传输到主机内存或从主机
+内存读取,而无需修改页缓存。
+
+此外,SCSI 与 SATA 规范要求的16位 CRC 校验在软件中计算代价较高。基准测试发
+现,计算此校验在高负载情形下显著影响系统性能。一些控制器允许在操作系统接口处
+使用轻量级校验。例如 Emulex 支持 TCP/IP 校验。操作系统提供的 IP 校验在写入
+时会转换为16位 CRC,读取时则相反。这允许 Linux 或应用程序以极低的开销生成
+完整性元数据(与软件 RAID5 相当)。
+
+IP 校验在检测位错误方面比 CRC 弱,但关键在于数据缓冲区与完整性元数据缓冲区
+的分离。只有这两个不同的缓冲区匹配,I/O 才能完成。
+
+数据与完整性元数据缓冲区的分离以及校验选择被称为数据完整性扩展。由于这些扩展
+超出了协议主体(T10、T13)的范围,Oracle 及其合作伙伴正尝试在存储网络行业协
+会内对其进行标准化。
+
+3. 内核变更
+===========
+
+Linux 中的数据完整性框架允许将保护信息固定到 I/O 上,并在支持该功能的控制器
+之间发送和接收。
+
+SCSI 和 SATA 中完整性扩展的优势在于,它们能够保护从应用程序到存储设备的整个
+路径。然而,这同时也是最大的劣势。这意味着保护信息必须采用磁盘可以理解的格式。
+
+通常,Linux/POSIX 应用程序并不关心所访问存储设备的具体细节。虚拟文件系统层
+和块层会让硬件扇区大小和传输协议对应用程序完全透明。
+
+然而,在准备发送到磁盘的保护信息时,就需要这种细节。因此,端到端保护方案的概
+念实际上违反了层次结构。应用程序完全不应该知道它访问的是 SCSI 还是 SATA 磁盘。
+
+Linux 中实现的数据完整性支持尝试将这些细节对应用程序隐藏。就应用程序(以及在
+某种程度上内核)而言,完整性元数据是附加在 I/O 上的不透明信息。
+
+当前实现允许块层自动为任何 I/O 生成保护信息。最终目标是将用户数据的完整性元
+数据计算移至用户空间。内核中产生的元数据和其他 I/O 仍将使用自动生成接口。
+
+一些存储设备允许为每个硬件扇区附加一个16位的标识值。这个标识空间的所有者是
+块设备的所有者,也就是在多数情况下由文件系统掌控。文件系统可以利用这额外空间
+按需为扇区附加标识。由于标识空间有限,块接口允许通过交错方式对更大的数据块标
+识。这样,8*16位的信息可以附加到典型的 4KB 文件系统块上。
+
+这也意味着诸如 fsck 和 mkfs 等应用程序需要能够从用户空间访问并操作这些标记。
+为此,正在开发一个透传接口。
+
+4. 块层实现细节
+===============
+
+4.1 Bio
+--------
+
+当启用 CONFIG_BLK_DEV_INTEGRITY 时,数据完整性补丁会在 struct bio 中添加
+一个新字段。调用 bio_integrity(bio) 会返回一个指向 struct bip 的指针,该
+结构体包含了该 bio 的完整性负载。本质上,bip 是一个精简版的 struct bio,其
+中包含一个 bio_vec,用于保存完整性元数据以及所需的维护信息(bvec 池、向量计
+数等)。
+
+内核子系统可以通过调用 bio_integrity_alloc(bio) 来为某个 bio 启用数据完整
+性保护。该函数会分配并附加一个 bip 到该 bio 上。
+
+随后使用 bio_integrity_add_page() 将包含完整性元数据的单独页面附加到该 bio。
+
+调用 bio_free() 会自动释放bip。
+
+4.2 块设备
+-----------
+
+块设备可以在 queue_limits 结构中的 integrity 子结构中设置完整性信息。
+
+对于分层块设备,需要选择一个适用于所有子设备的完整性配置文件。可以使用
+queue_limits_stack_integrity() 来协助完成该操作。目前,DM 和 MD linear、
+RAID0 和 RAID1 已受支持。而RAID4/5/6因涉及应用标签仍需额外的开发工作。
+
+5.0 块层完整性API
+==================
+
+5.1 普通文件系统
+-----------------
+
+ 普通文件系统并不知道其下层块设备具备发送或接收完整性元数据的能力。
+ 在执行写操作时,块层会在调用 submit_bio() 时自动生成完整性元数据。
+ 在执行读操作时,I/O 完成后会触发完整性验证。
+
+ IMD 的生成与验证行为可以通过以下开关控制::
+
+ /sys/block/<bdev>/integrity/write_generate
+
+ and::
+
+ /sys/block/<bdev>/integrity/read_verify
+
+ flags.
+
+5.2 具备完整性感知的文件系统
+----------------------------
+
+ 具备完整性感知能力的文件系统可以在准备 I/O 时附加完整性元数据,
+ 并且如果底层块设备支持应用标签空间,也可以加以利用。
+
+
+ `bool bio_integrity_prep(bio);`
+
+ 要为写操作生成完整性元数据或为读操作设置缓冲区,文件系统必须调用
+ bio_integrity_prep(bio)。
+
+ 在调用此函数之前,必须先设置好 bio 的数据方向和起始扇区,并确
+ 保该 bio 已经添加完所有的数据页。调用者需要自行保证,在 I/O 进行
+ 期间 bio 不会被修改。如果由于某种原因准备失败,则应当以错误状态
+ 完成该 bio。
+
+5.3 传递已有的完整性元数据
+--------------------------
+
+ 能够自行生成完整性元数据或可以从用户空间传输完整性元数据的文件系统,
+ 可以使用如下接口:
+
+
+ `struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);`
+
+ 为 bio 分配完整性负载并挂载到 bio 上。nr_pages 表示需要在
+ integrity bio_vec list 中存储多少页保护数据(类似 bio_alloc)。
+
+ 完整性负载将在 bio_free() 被调用时释放。
+
+
+ `int bio_integrity_add_page(bio, page, len, offset);`
+
+ 将包含完整性元数据的一页附加到已有的 bio 上。该 bio 必须已有 bip,
+ 即必须先调用 bio_integrity_alloc()。对于写操作,页中的完整
+ 性元数据必须采用目标设备可识别的格式,但有一个例外,当请求在 I/O 栈
+ 中传递时,扇区号会被重新映射。这意味着通过此接口添加的页在 I/O 过程
+ 中可能会被修改!完整性元数据中的第一个引用标签必须等于 bip->bip_sector。
+
+ 只要 bip bio_vec array(nr_pages)有空间,就可以继续通过
+ bio_integrity_add_page()添加页。
+
+ 当读操作完成后,附加的页将包含从存储设备接收到的完整性元数据。
+ 接收方需要处理这些元数据,并在操作完成时验证数据完整性
+
+
+----------------------------------------------------------------------
+
+2007-12-24 Martin K. Petersen <martin.petersen@oracle.com> \ No newline at end of file
diff --git a/Documentation/translations/zh_CN/block/index.rst b/Documentation/translations/zh_CN/block/index.rst
new file mode 100644
index 000000000000..f2ae5096ed68
--- /dev/null
+++ b/Documentation/translations/zh_CN/block/index.rst
@@ -0,0 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/block/index.rst
+
+:翻译:
+
+ 柯子杰 ke zijie <kezijie@leap-io-kernel.com>
+
+:校译:
+
+=====
+Block
+=====
+
+.. toctree::
+ :maxdepth: 1
+
+ blk-mq
+ data-integrity
+
+TODOList:
+* bfq-iosched
+* biovecs
+* cmdline-partition
+* deadline-iosched
+* inline-encryption
+* ioprio
+* kyber-iosched
+* null_blk
+* pr
+* stat
+* switching-sched
+* writeback_cache_control
+* ublk
diff --git a/Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst b/Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
index 282aacd33442..0b382a32b3fe 100644
--- a/Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
+++ b/Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
@@ -2,7 +2,7 @@
.. include:: ../disclaimer-zh_CN.rst
-:Original: Documentation/dev-tools/gdb-kernel-debugging.rst
+:Original: Documentation/process/debugging/gdb-kernel-debugging.rst
:Translator: 高超 gao chao <gaochao49@huawei.com>
通过gdb调试内核和模块
diff --git a/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst b/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst
index d20b4ce66b9f..dbfd65398077 100644
--- a/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst
+++ b/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst
@@ -28,15 +28,15 @@
::
- ./scripts/checktransupdate.py --help
+ tools/docs/checktransupdate.py --help
具体用法请参考参数解析器的输出
示例
-- ``./scripts/checktransupdate.py -l zh_CN``
+- ``tools/docs/checktransupdate.py -l zh_CN``
这将打印 zh_CN 语言中需要更新的所有文件。
-- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
+- ``tools/docs/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst``
这将只打印指定文件的状态。
然后输出类似如下的内容:
diff --git a/Documentation/translations/zh_CN/doc-guide/contributing.rst b/Documentation/translations/zh_CN/doc-guide/contributing.rst
index 394a13b438b0..b0c8ba782b16 100644
--- a/Documentation/translations/zh_CN/doc-guide/contributing.rst
+++ b/Documentation/translations/zh_CN/doc-guide/contributing.rst
@@ -124,7 +124,7 @@ C代码编译器发出的警告常常会被视为误报,从而导致出现了
这使得这些信息更难找到,例如使Sphinx无法生成指向该文档的链接。将 ``kernel-doc``
指令添加到文档中以引入这些注释可以帮助社区获得为编写注释所做工作的全部价值。
-``scripts/find-unused-docs.sh`` 工具可以用来找到这些被忽略的评论。
+``tools/docs/find-unused-docs.sh`` 工具可以用来找到这些被忽略的评论。
请注意,将导出的函数和数据结构引入文档是最有价值的。许多子系统还具有供内部
使用的kernel-doc注释;除非这些注释放在专门针对相关子系统开发人员的文档中,
diff --git a/Documentation/translations/zh_CN/doc-guide/parse-headers.rst b/Documentation/translations/zh_CN/doc-guide/parse-headers.rst
index a08819e904ed..65d9dc5143ff 100644
--- a/Documentation/translations/zh_CN/doc-guide/parse-headers.rst
+++ b/Documentation/translations/zh_CN/doc-guide/parse-headers.rst
@@ -13,20 +13,20 @@
有时,为了描述用户空间API并在代码和文档之间生成交叉引用,需要包含头文件和示例
C代码。为用户空间API文件添加交叉引用还有一个好处:如果在文档中找不到相应符号,
Sphinx将生成警告。这有助于保持用户空间API文档与内核更改同步。
-:ref:`parse_headers.pl <parse_headers_zh>` 提供了生成此类交叉引用的一种方法。
+:ref:`parse_headers.py <parse_headers_zh>` 提供了生成此类交叉引用的一种方法。
在构建文档时,必须通过Makefile调用它。有关如何在内核树中使用它的示例,请参阅
``Documentation/userspace-api/media/Makefile`` 。
.. _parse_headers_zh:
-parse_headers.pl
+parse_headers.py
----------------
脚本名称
~~~~~~~~
-parse_headers.pl——解析一个C文件,识别函数、结构体、枚举、定义并对Sphinx文档
+parse_headers.py——解析一个C文件,识别函数、结构体、枚举、定义并对Sphinx文档
创建交叉引用。
@@ -34,7 +34,7 @@ parse_headers.pl——解析一个C文件,识别函数、结构体、枚举、
~~~~~~~~
-\ **parse_headers.pl**\ [<选项>] <C文件> <输出文件> [<例外文件>]
+\ **parse_headers.py**\ [<选项>] <C文件> <输出文件> [<例外文件>]
<选项> 可以是: --debug, --help 或 --usage 。
diff --git a/Documentation/translations/zh_CN/doc-guide/sphinx.rst b/Documentation/translations/zh_CN/doc-guide/sphinx.rst
index 23eac67fbc30..3375c6f3a811 100644
--- a/Documentation/translations/zh_CN/doc-guide/sphinx.rst
+++ b/Documentation/translations/zh_CN/doc-guide/sphinx.rst
@@ -84,7 +84,7 @@ PDF和LaTeX构建
这有一个脚本可以自动检查Sphinx依赖项。如果它认得您的发行版,还会提示您所用发行
版的安装命令::
- $ ./scripts/sphinx-pre-install
+ $ ./tools/docs/sphinx-pre-install
Checking if the needed tools for Fedora release 26 (Twenty Six) are available
Warning: better to also install "texlive-luatex85".
You should run:
@@ -94,7 +94,7 @@ PDF和LaTeX构建
. sphinx_2.4.4/bin/activate
pip install -r Documentation/sphinx/requirements.txt
- Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
+ Can't build as 1 mandatory dependency is missing at ./tools/docs/sphinx-pre-install line 468.
默认情况下,它会检查html和PDF的所有依赖项,包括图像、数学表达式和LaTeX构建的
需求,并假设将使用虚拟Python环境。html构建所需的依赖项被认为是必需的,其他依
diff --git a/Documentation/translations/zh_CN/filesystems/dnotify.rst b/Documentation/translations/zh_CN/filesystems/dnotify.rst
new file mode 100644
index 000000000000..5ab109b9424c
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/dnotify.rst
@@ -0,0 +1,67 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/dnotify.rst
+
+:翻译:
+
+ 王龙杰 Wang Longjie <wang.longjie1@zte.com.cn>
+
+==============
+Linux 目录通知
+==============
+
+ Stephen Rothwell <sfr@canb.auug.org.au>
+
+目录通知的目的是使用户应用程序能够在目录或目录中的任何文件发生变更时收到通知。基本机制包括应用程序
+通过 fcntl(2) 调用在目录上注册通知,通知本身则通过信号传递。
+
+应用程序可以决定希望收到哪些 “事件” 的通知。当前已定义的事件如下:
+
+ ========= =====================================
+ DN_ACCESS 目录中的文件被访问(read)
+ DN_MODIFY 目录中的文件被修改(write,truncate)
+ DN_CREATE 目录中创建了文件
+ DN_DELETE 目录中的文件被取消链接
+ DN_RENAME 目录中的文件被重命名
+ DN_ATTRIB 目录中的文件属性被更改(chmod,chown)
+ ========= =====================================
+
+通常,应用程序必须在每次通知后重新注册,但如果将 DN_MULTISHOT 与事件掩码进行或运算,则注册
+将一直保持有效,直到被显式移除(通过注册为不接收任何事件)。
+
+默认情况下,SIGIO 信号将被传递给进程,且不附带其他有用的信息。但是,如果使用 F_SETSIG fcntl(2)
+调用让内核知道要传递哪个信号,一个 siginfo 结构体将被传递给信号处理程序,该结构体的 si_fd 成员将
+包含与发生事件的目录相关联的文件描述符。
+
+应用程序最好选择一个实时信号(SIGRTMIN + <n>),以便通知可以被排队。如果指定了 DN_MULTISHOT,
+这一点尤为重要。注意,SIGRTMIN 通常是被阻塞的,因此最好使用(至少)SIGRTMIN + 1。
+
+实现预期(特性与缺陷 :-))
+--------------------------
+
+对于文件的任何本地访问,通知都应能正常工作,即使实际文件系统位于远程服务器上。这意味着,对本地用户
+模式服务器提供的文件的远程访问应能触发通知。同样的,对本地内核 NFS 服务器提供的文件的远程访问
+也应能触发通知。
+
+为了尽可能减小对文件系统代码的影响,文件硬链接的问题已被忽略。因此,如果一个文件(x)存在于两个
+目录(a 和 b)中,通过名称”a/x”对该文件进行的更改应通知给期望接收目录“a”通知的程序,但不会
+通知给期望接收目录“b”通知的程序。
+
+此外,取消链接的文件仍会在它们链接到的最后一个目录中触发通知。
+
+配置
+----
+
+Dnotify 由 CONFIG_DNOTIFY 配置选项控制。禁用该选项时,fcntl(fd, F_NOTIFY, ...) 将返
+回 -EINVAL。
+
+示例
+----
+具体示例可参见 tools/testing/selftests/filesystems/dnotify_test.c。
+
+注意
+----
+从 Linux 2.6.13 开始,dnotify 已被 inotify 取代。有关 inotify 的更多信息,请参见
+Documentation/filesystems/inotify.rst。
diff --git a/Documentation/translations/zh_CN/filesystems/gfs2-glocks.rst b/Documentation/translations/zh_CN/filesystems/gfs2-glocks.rst
new file mode 100644
index 000000000000..abfd2f2f94e9
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/gfs2-glocks.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/gfs2-glocks.rst
+
+:翻译:
+
+ 邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
+
+:校译:
+
+ 杨涛 yang tao <yang.tao172@zte.com.cn>
+
+==================
+Glock 内部加锁规则
+==================
+
+本文档阐述 glock 状态机内部运作的基本原理。每个 glock(即
+fs/gfs2/incore.h 中的 struct gfs2_glock)包含两把主要的内部锁:
+
+ 1. 自旋锁(gl_lockref.lock):用于保护内部状态(如
+ gl_state、gl_target)和持有者列表(gl_holders)
+ 2. 非阻塞的位锁(GLF_LOCK):用于防止其他线程同时调用
+ DLM 等操作。若某线程获取此锁,则在释放时必须调用
+ run_queue(通常通过工作队列),以确保所有待处理任务
+ 得以完成。
+
+gl_holders 列表包含与该 glock 关联的所有排队锁请求(不
+仅是持有者)。若存在已持有的锁,它们将位于列表开头的连
+续条目中。锁的授予严格遵循排队顺序。
+
+glock 层用户可请求三种锁状态:共享(SH)、延迟(DF)和
+排他(EX)。它们对应以下 DLM 锁模式:
+
+========== ====== =====================================================
+Glock 模式 DLM 锁模式
+========== ====== =====================================================
+ UN IV/NL 未加锁(无关联的 DLM 锁)或 NL
+ SH PR 受保护读(Protected read)
+ DF CW 并发写(Concurrent write)
+ EX EX 排他(Exclusive)
+========== ====== =====================================================
+
+因此,DF 本质上是一种与“常规”共享锁模式(SH)互斥的共
+享模式。在 GFS2 中,DF 模式专用于直接 I/O 操作。Glock
+本质上是锁加缓存管理例程的组合,其缓存规则如下:
+
+========== ============== ========== ========== ==============
+Glock 模式 缓存元数据 缓存数据 脏数据 脏元数据
+========== ============== ========== ========== ==============
+ UN 否 否 否 否
+ DF 是 否 否 否
+ SH 是 是 否 否
+ EX 是 是 是 是
+========== ============== ========== ========== ==============
+
+这些规则通过为每种 glock 定义的操作函数实现。并非所有
+glock 类型都使用全部的模式,例如仅 inode glock 使用 DF 模
+式。
+
+glock 操作函数及类型常量说明表:
+
+============== ========================================================
+字段 用途
+============== ========================================================
+go_sync 远程状态变更前调用(如同步脏数据)
+go_xmote_bh 远程状态变更后调用(如刷新缓存)
+go_inval 远程状态变更需使缓存失效时调用
+go_instantiate 获取 glock 时调用
+go_held 每次获取 glock 持有者时调用
+go_dump 为 debugfs 文件打印对象内容,或出错时将 glock 转储至日志
+go_callback 若 DLM 发送回调以释放此锁时调用
+go_unlocked 当 glock 解锁时调用(dlm_unlock())
+go_type glock 类型,``LM_TYPE_*``
+go_flags 若 glock 关联地址空间,则设置GLOF_ASPACE 标志
+============== ========================================================
+
+每种锁的最短持有时间是指在远程锁授予后忽略远程降级请求
+的时间段。此举旨在防止锁在集群节点间持续弹跳而无实质进
+展的情况,此现象常见于多节点写入的共享内存映射文件。通
+过延迟响应远程回调的降级操作,为用户空间程序争取页面取
+消映射前的处理时间。
+
+未来计划将 glock 的 "EX" 模式设为本地共享,使本地锁通
+过 i_mutex 实现而非 glock。
+
+glock 操作函数的加锁规则:
+
+============== ====================== =============================
+操作 GLF_LOCK 位锁持有 gl_lockref.lock 自旋锁持有
+============== ====================== =============================
+go_sync 是 否
+go_xmote_bh 是 否
+go_inval 是 否
+go_instantiate 否 否
+go_held 否 否
+go_dump 有时 是
+go_callback 有时(N/A) 是
+go_unlocked 是 否
+============== ====================== =============================
+
+.. Note::
+
+ 若入口处持有锁则操作期间不得释放位锁或自旋锁。
+ go_dump 和 do_demote_ok 严禁阻塞。
+ 仅当 glock 状态指示其缓存最新数据时才会调用 go_dump。
+
+GFS2 内部的 glock 加锁顺序:
+
+ 1. i_rwsem(如需要)
+ 2. 重命名 glock(仅用于重命名)
+ 3. Inode glock
+ (父级优先于子级,同级 inode 按锁编号排序)
+ 4. Rgrp glock(用于(反)分配操作)
+ 5. 事务 glock(通过 gfs2_trans_begin,非读操作)
+ 6. i_rw_mutex(如需要)
+ 7. 页锁(始终最后,至关重要!)
+
+每个 inode 对应两把 glock:一把管理 inode 本身(加锁顺
+序如上),另一把(称为 iopen glock)结合 inode 的
+i_nlink 字段决定 inode 生命周期。inode 加锁基于单个
+inode,rgrp 加锁基于单个 rgrp。通常优先获取本地锁再获
+取集群锁。
+
+Glock 统计
+----------
+
+统计分为两类:超级块相关统计和单个 glock 相关统计。超级
+块统计按每 CPU 执行以减少收集开销,并进一步按 glock 类
+型细分。所有时间单位为纳秒。
+
+超级块和 glock 统计收集相同信息。超级块时序统计为 glock
+时序统计提供默认值,使新建 glock 具有合理的初始值。每个
+glock 的计数器在创建时初始化为零,当 glock 从内存移除时
+统计丢失。
+
+统计包含三组均值/方差对及两个计数器。均值/方差对为平滑
+指数估计,算法与网络代码中的往返时间计算类似(参见《
+TCP/IP详解 卷1》第21.3节及《卷2》第25.10节)。与 TCP/IP
+案例不同,此处均值/方差未缩放且单位为整数纳秒。
+
+三组均值/方差对测量以下内容:
+
+ 1. DLM 锁时间(非阻塞请求)
+ 2. DLM 锁时间(阻塞请求)
+ 3. 请求间隔时间(指向 DLM)
+
+非阻塞请求指无论目标 DLM 锁处于何种状态均能立即完成的请求。
+当前满足条件的请求包括:(a)锁当前状态为互斥(如锁降级)、
+(b)请求状态为空置或解锁(同样如锁降级)、或(c)设置"try lock"
+标志的请求。其余锁请求均属阻塞请求。
+
+两个计数器分别统计:
+ 1. 锁请求总数(决定均值/方差计算的数据量)
+ 2. glock 代码顶层的持有者排队数(通常远大于 DLM 锁请求数)
+
+为什么收集这些统计数据?我们需深入分析时序参数的动因如下:
+
+1. 更精准设置 glock "最短持有时间"
+2. 快速识别性能问题
+3. 改进资源组分配算法(基于锁等待时间而非盲目 "try lock")
+
+因平滑更新的特性,采样量的阶跃变化需经 8 次采样(方差需
+4 次)才能完全体现,解析结果时需审慎考虑。
+
+通过锁请求完成时间和 glock 平均锁请求间隔时间,可计算节
+点使用 glock 时长与集群共享时长的占比,对设置锁最短持有
+时间至关重要。
+
+我们已采取严谨措施,力求精准测量目标量值。任何测量系统均
+存在误差,但我期望当前方案已达到合理精度极限。
+
+超级块状态统计路径::
+
+ /sys/kernel/debug/gfs2/<fsname>/sbstats
+
+Glock 状态统计路径::
+
+ /sys/kernel/debug/gfs2/<fsname>/glstats
+
+(假设 debugfs 挂载于 /sys/kernel/debug,且 <fsname> 替
+换为对应 GFS2 文件系统名)
+
+输出缩写说明:
+
+========= ============================================
+srtt 非阻塞 DLM 请求的平滑往返时间
+srttvar srtt 的方差估计
+srttb (潜在)阻塞 DLM 请求的平滑往返时间
+srttvarb srttb 的方差估计
+sirt DLM 请求的平滑请求间隔时间
+sirtvar sirt 的方差估计
+dlm DLM 请求数(glstats 文件中的 dcnt)
+queue 排队的 glock 请求数(glstats 文件中的 qcnt)
+========= ============================================
+
+sbstats文件按glock类型(每种类型8行)和CPU核心(每CPU一列)
+记录统计数据集。glstats文件则为每个glock提供统计集,其格式
+与glocks文件类似,但所有时序统计量均采用均值/方差格式存储。
+
+gfs2_glock_lock_time 跟踪点实时输出目标 glock 的当前统计
+值,并附带每次接收到的dlm响应附加信息:
+
+====== ============
+status DLM 请求状态
+flags DLM 请求标志
+tdiff 该请求的耗时
+====== ============
+
+(其余字段同上表)
diff --git a/Documentation/translations/zh_CN/filesystems/gfs2-uevents.rst b/Documentation/translations/zh_CN/filesystems/gfs2-uevents.rst
new file mode 100644
index 000000000000..3975c4544118
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/gfs2-uevents.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/gfs2-uevents.rst
+
+:翻译:
+
+ 邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
+
+:校译:
+
+ 杨涛 yang tao <yang.tao172@zte.com.cn>
+
+===============
+uevents 与 GFS2
+===============
+
+在 GFS2 文件系统的挂载生命周期内,会生成多个 uevent。
+本文档解释了这些事件的含义及其用途(被 gfs2-utils 中的 gfs_controld 使用)。
+
+GFS2 uevents 列表
+=================
+
+1. ADD
+------
+
+ADD 事件发生在挂载时。它始终是新建文件系统生成的第一个 uevent。如果挂载成
+功,随后会生成 ONLINE uevent。如果挂载失败,则随后会生成 REMOVE uevent。
+
+ADD uevent 包含两个环境变量:SPECTATOR=[0|1] 和 RDONLY=[0|1],分别用
+于指定文件系统的观察者状态(一种未分配日志的只读挂载)和只读状态(已分配日志)。
+
+2. ONLINE
+---------
+
+ONLINE uevent 在成功挂载或重新挂载后生成。它具有与 ADD uevent 相同的环
+境变量。ONLINE uevent 及其用于标识观察者和 RDONLY 状态的两个环境变量是较
+新版本内核引入的功能(2.6.32-rc+ 及以上),旧版本内核不会生成此事件。
+
+3. CHANGE
+---------
+
+CHANGE uevent 在两种场景下使用。一是报告第一个节点成功挂载文件系统时
+(FIRSTMOUNT=Done)。这作为信号告知 gfs_controld,此时集群中其他节点可以
+安全挂载该文件系统。
+
+另一个 CHANGE uevent 用于通知文件系统某个日志的日志恢复已完成。它包含两个
+环境变量:JID= 指定刚恢复的日志 ID,RECOVERY=[Done|Failed] 表示操作成
+功与否。这些 uevent 会在每次日志恢复时生成,无论是在初始挂载过程中,还是
+gfs_controld 通过 /sys/fs/gfs2/<fsname>/lock_module/recovery 文件
+请求特定日志恢复的结果。
+
+由于早期版本的 gfs_controld 使用 CHANGE uevent 时未检查环境变量以确定状
+态,若为其添加新功能,存在用户工具版本过旧导致集群故障的风险。因此,在新增用
+于标识成功挂载或重新挂载的 uevent 时,选择了使用 ONLINE uevent。
+
+4. OFFLINE
+----------
+
+OFFLINE uevent 仅在文件系统发生错误时生成,是 "withdraw" 机制的一部分。
+当前该事件未提供具体错误信息,此问题有待修复。
+
+5. REMOVE
+---------
+
+REMOVE uevent 在挂载失败结束或卸载文件系统时生成。所有 REMOVE uevent
+之前都至少存在同一文件系统的 ADD uevent。与其他 uevent 不同,它由内核的
+kobject 子系统自动生成。
+
+
+所有 GFS2 uevents 的通用信息(uevent 环境变量)
+===============================================
+
+1. LOCKTABLE=
+--------------
+
+LOCKTABLE 是一个字符串,其值来源于挂载命令行(locktable=)或 fstab 文件。
+它用作文件系统标签,并为 lock_dlm 类型的挂载提供加入集群所需的信息。
+
+2. LOCKPROTO=
+-------------
+
+LOCKPROTO 是一个字符串,其值取决于挂载命令行或 fstab 中的设置。其值将是
+lock_nolock 或 lock_dlm。未来可能支持其他锁管理器。
+
+3. JOURNALID=
+-------------
+
+如果文件系统正在使用日志(观察者挂载不分配日志),则所有 GFS2 uevent 中都
+会包含此变量,其值为数字形式的日志 ID。
+
+4. UUID=
+--------
+
+在较新版本的 gfs2-utils 中,mkfs.gfs2 会向文件系统超级块写入 UUID。若存
+在 UUID,所有与该文件系统相关的 uevent 中均会包含此信息。
diff --git a/Documentation/translations/zh_CN/filesystems/gfs2.rst b/Documentation/translations/zh_CN/filesystems/gfs2.rst
new file mode 100644
index 000000000000..ffa62b12b019
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/gfs2.rst
@@ -0,0 +1,57 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/gfs2.rst
+
+:翻译:
+
+ 邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
+
+:校译:
+
+ 杨涛 yang tao <yang.tao172@zte.com.cn>
+
+=====================================
+全局文件系统 2 (Global File System 2)
+=====================================
+
+GFS2 是一个集群文件系统。它允许一组计算机同时使用在它们之间共享的块设备(通
+过 FC、iSCSI、NBD 等)。GFS2 像本地文件系统一样读写块设备,但也使用一个锁
+模块来让计算机协调它们的 I/O 操作,从而维护文件系统的一致性。GFS2 的出色特
+性之一是完美一致性——在一台机器上对文件系统所做的更改会立即显示在集群中的所
+有其他机器上。
+
+GFS2 使用可互换的节点间锁定机制,当前支持的机制有:
+
+ lock_nolock
+ - 允许将 GFS2 用作本地文件系统
+
+ lock_dlm
+ - 使用分布式锁管理器 (dlm) 进行节点间锁定。
+ 该 dlm 位于 linux/fs/dlm/
+
+lock_dlm 依赖于在上述 URL 中找到的用户空间集群管理系统。
+
+若要将 GFS2 用作本地文件系统,则不需要外部集群系统,只需::
+
+ $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
+ $ mount -t gfs2 /dev/block_device /dir
+
+在所有集群节点上都需要安装 gfs2-utils 软件包;对于 lock_dlm,您还需要按
+照文档配置 dlm 和 corosync 用户空间工具。
+
+gfs2-utils 可在 https://pagure.io/gfs2-utils 找到。
+
+GFS2 在磁盘格式上与早期版本的 GFS 不兼容,但它已相当接近。
+
+以下手册页 (man pages) 可在 gfs2-utils 中找到:
+
+ ============ =============================================
+ fsck.gfs2 用于修复文件系统
+ gfs2_grow 用于在线扩展文件系统
+ gfs2_jadd 用于在线向文件系统添加日志
+ tunegfs2 用于操作、检查和调优文件系统
+ gfs2_convert 用于将 gfs 文件系统原地转换为 GFS2
+ mkfs.gfs2 用于创建文件系统
+ ============ =============================================
diff --git a/Documentation/translations/zh_CN/filesystems/index.rst b/Documentation/translations/zh_CN/filesystems/index.rst
index 9f2a8b003778..fcc79ff9fdad 100644
--- a/Documentation/translations/zh_CN/filesystems/index.rst
+++ b/Documentation/translations/zh_CN/filesystems/index.rst
@@ -15,6 +15,16 @@ Linux Kernel中的文件系统
文件系统(VFS)层以及基于其上的各种文件系统如何工作呈现给大家。当前\
可以看到下面的内容。
+核心 VFS 文档
+=============
+
+有关 VFS 层本身以及其算法工作方式的文档,请参阅这些手册。
+
+.. toctree::
+ :maxdepth: 1
+
+ dnotify
+
文件系统
========
@@ -26,4 +36,9 @@ Linux Kernel中的文件系统
virtiofs
debugfs
tmpfs
-
+ ubifs
+ ubifs-authentication
+ gfs2
+ gfs2-uevents
+ gfs2-glocks
+ inotify
diff --git a/Documentation/translations/zh_CN/filesystems/inotify.rst b/Documentation/translations/zh_CN/filesystems/inotify.rst
new file mode 100644
index 000000000000..b4d740aca946
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/inotify.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/inotify.rst
+
+:翻译:
+
+ 王龙杰 Wang Longjie <wang.longjie1@zte.com.cn>
+
+==========================================
+Inotify - 一个强大且简单的文件变更通知系统
+==========================================
+
+
+
+文档由 Robert Love <rml@novell.com> 于 2005 年 3 月 15 日开始撰写
+
+文档由 Zhang Zhen <zhenzhang.zhang@huawei.com> 于 2015 年 1 月 4 日更新
+
+ - 删除了已废弃的接口,关于用户接口请参考手册页。
+
+(i) 基本原理
+
+问:
+ 不将监控项与被监控对象打开的文件描述符(fd)绑定,这背后的设计决策是什么?
+
+答:
+ 监控项会与打开的 inotify 设备相关联,而非与打开的文件相关联。这解决了 dnotify 的主要问题:
+ 保持文件打开会锁定文件,更糟的是,还会锁定挂载点。因此,dnotify 在带有可移动介质的桌面系统
+ 上难以使用,因为介质将无法被卸载。监控文件不应要求文件处于打开状态。
+
+问:
+ 与每个监控项一个文件描述符的方式相比,采用每个实例一个文件描述符的设计决策是出于什么
+ 考虑?
+
+答:
+ 每个监控项一个文件描述符会很快的消耗掉超出允许数量的文件描述符,其数量会超出实际可管理的范
+ 围,也会超出 select() 能高效处理的范围。诚然,root 用户可以提高每个进程的文件描述符限制,
+ 用户也可以使用 epoll,但同时要求这两者是不合理且多余的。一个监控项所消耗的内存比一个打开的文
+ 件要少,因此将这两个数量空间分开是合理的。当前的设计正是用户空间开发者所期望的:用户只需初始
+ 化一次 inotify,然后添加 n 个监控项,而这只需要一个文件描述符,无需调整文件描述符限制。初
+ 始化 inotify 实例初始化两千次是很荒谬的。如果我们能够简洁地实现用户空间的偏好——而且我们
+ 确实可以,idr 层让这类事情变得轻而易举——那么我们就应该这么做。
+
+ 还有其他合理的理由。如果只有一个文件描述符,那就只需要在该描述符上阻塞,它对应着一个事件队列。
+ 这个单一文件描述符会返回所有的监控事件以及任何可能的带外数据。而如果每个文件描述符都是一个独
+ 立的监控项,
+
+ - 将无法知晓事件的顺序。文件 foo 和文件 bar 上的事件会触发两个文件描述符上的 poll(),
+ 但无法判断哪个事件先发生。而用单个队列就可以很容易的提供事件的顺序。这种顺序对现有的应用程
+ 序(如 Beagle)至关重要。想象一下,如果“mv a b ; mv b a”这样的事件没有顺序会是什么
+ 情况。
+
+ - 我们将不得不维护 n 个文件描述符和 n 个带有状态的内部队列,而不是仅仅一个。这在 kernel 中
+ 会混乱得多。单个线性队列是合理的数据结构。
+
+ - 用户空间开发者更青睐当前的 API。例如,Beagle 的开发者们就很喜欢它。相信我,我问过他们。
+ 这并不奇怪:谁会想通过 select 来管理以及阻塞在 1000 个文件描述符上呢?
+
+ - 无法获取带外数据。
+
+ - 1024 这个数量仍然太少。 ;-)
+
+ 当要设计一个可扩展到数千个目录的文件变更通知系统时,处理数千个文件描述符似乎并不是合适的接口。
+ 这太繁琐了。
+
+ 此外,创建多个实例、处理多个队列以及相应的多个文件描述符是可行的。不必是每个进程对应一个文件描
+ 述符;而是每个队列对应一个文件描述符,一个进程完全可能需要多个队列。
+
+问:
+ 为什么采用系统调用的方式?
+
+答:
+ 糟糕的用户空间接口是 dnotify 的第二大问题。信号对于文件通知来说是一种非常糟糕的接口。其实对
+ 于其他任何事情,信号也都不是好的接口。从各个角度来看,理想的解决方案是基于文件描述符的,它允许
+ 基本的文件 I/O 操作以及 poll/select 操作。获取文件描述符和管理监控项既可以通过设备文件来
+ 实现,也可以通过一系列新的系统调用来实现。我们决定采用一系列系统调用,因为这是提供新的内核接口
+ 的首选方法。两者之间唯一真正的区别在于,我们是想使用 open(2) 和 ioctl(2),还是想使用几
+ 个新的系统调用。系统调用比 ioctl 更有优势。
diff --git a/Documentation/translations/zh_CN/filesystems/ubifs-authentication.rst b/Documentation/translations/zh_CN/filesystems/ubifs-authentication.rst
new file mode 100644
index 000000000000..0e7cf7707e26
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/ubifs-authentication.rst
@@ -0,0 +1,354 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/ubifs-authentication.rst
+
+:翻译:
+
+ 邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
+
+:校译:
+
+ 杨涛 yang tao <yang.tao172@zte.com.cn>
+
+=============
+UBIFS认证支持
+=============
+
+引言
+====
+UBIFS 利用 fscrypt 框架为文件内容及文件名提供保密性。这能防止攻击者在单一
+时间点读取文件系统内容的攻击行为。典型案例是智能手机丢失时,攻击者若没有文件
+系统解密密钥则无法读取设备上的个人数据。
+
+在现阶段,UBIFS 加密尚不能防止攻击者篡改文件系统内容后用户继续使用设备的攻
+击场景。这种情况下,攻击者可任意修改文件系统内容而不被用户察觉。例如修改二
+进制文件使其执行时触发恶意行为 [DMC-CBC-ATTACK]。由于 UBIFS 大部分文件
+系统元数据以明文存储,使得文件替换和内容篡改变得相当容易。
+
+其他全盘加密系统(如 dm-crypt)可以覆盖所有文件系统元数据,这类系统虽然能
+增加这种攻击的难度,但特别是当攻击者能多次访问设备时,也有可能实现攻击。对于
+基于 Linux 块 IO 层的 dm-crypt 等文件系统,可通过 dm-integrity 或
+dm-verity 子系统[DM-INTEGRITY, DM-VERITY]在块层实现完整数据认证,这些
+功能也可与 dm-crypt 结合使用[CRYPTSETUP2]。
+
+本文描述一种为 UBIFS 实现文件内容认证和完整元数据认证的方法。由于 UBIFS
+使用 fscrypt 进行文件内容和文件名加密,认证系统可与 fscrypt 集成以利用密
+钥派生等现有功能。但系统同时也应支持在不启用加密的情况下使用 UBIFS 认证。
+
+
+MTD, UBI & UBIFS
+----------------
+在 Linux 中,MTD(内存技术设备)子系统提供访问裸闪存设备的统一接口。运行于
+MTD 之上的重要子系统是 UBI(无序块映像),它为闪存设备提供卷管理功能,类似
+于块设备的 LVM。此外,UBI 还处理闪存特有的磨损均衡和透明 I/O 错误处理。
+UBI 向上层提供逻辑擦除块(LEB),并透明地映射到闪存的物理擦除块(PEB)。
+
+UBIFS 是运行于 UBI 之上的裸闪存文件系统。因此 UBI 处理磨损均衡和部分闪存
+特性,而 UBIFS专注于可扩展性、性能和可恢复性。
+
+::
+
+ +------------+ +*******+ +-----------+ +-----+
+ | | * UBIFS * | UBI-BLOCK | | ... |
+ | JFFS/JFFS2 | +*******+ +-----------+ +-----+
+ | | +-----------------------------+ +-----------+ +-----+
+ | | | UBI | | MTD-BLOCK | | ... |
+ +------------+ +-----------------------------+ +-----------+ +-----+
+ +------------------------------------------------------------------+
+ | MEMORY TECHNOLOGY DEVICES (MTD) |
+ +------------------------------------------------------------------+
+ +-----------------------------+ +--------------------------+ +-----+
+ | NAND DRIVERS | | NOR DRIVERS | | ... |
+ +-----------------------------+ +--------------------------+ +-----+
+
+ 图1:处理裸闪存的 Linux 内核子系统
+
+
+
+UBIFS 内部维护多个持久化在闪存上的数据结构:
+
+- *索引*:存储在闪存上的 B+ 树,叶节点包含文件系统数据
+- *日志*:在更新闪存索引前收集文件系统变更的辅助数据结构,可减少闪存磨损
+- *树节点缓存(TNC)*:反映当前文件系统状态的内存 B+ 树,避免频繁读取闪存。
+ 本质上是索引的内存表示,但包含额外属性
+- *LEB属性树(LPT)*:用于统计每个 UBI LEB 空闲空间的闪存B+树
+
+本节后续将详细讨论UBIFS的闪存数据结构。因为 TNC 不直接持久化到闪存,其在此
+处的重要性较低。更多 UBIFS 细节详见[UBIFS-WP]。
+
+
+UBIFS 索引与树节点缓存
+~~~~~~~~~~~~~~~~~~~~~~
+
+UBIFS 在闪存上的基础实体称为 *节点* ,包含多种类型。如存储文件内容块的数据
+节点
+( ``struct ubifs_data_node`` ),或表示 VFS 索引节点的 inode 节点
+( ``struct ubifs_ino_node`` )。几乎所有节点共享包含节点类型、长度、序列
+号等基础信息的通用头
+( ``ubifs_ch`` )(见内核源码 ``fs/ubifs/ubifs-media.h`` )。LPT条目
+和填充节点(用于填充 LEB
+尾部不可用空间)等次要节点类型除外。
+
+为避免每次变更重写整个 B+ 树,UBIFS 采用 *wandering tree* 实现:仅重写
+变更节点,旧版本被标记废弃而非立即擦除。因此索引不固定存储于闪存某处,而是在
+闪存上 *wanders* ,在 LEB 被 UBIFS 重用前,闪存上会存在废弃部分。为定位
+最新索引,UBIFS 在 UBI LEB 1 存储称为 *主节点* 的特殊节点,始终指向最新
+UBIFS 索引根节点。为增强可恢复性,主节点还备份到 LEB 2。因此挂载 UBIFS 只
+需读取 LEB 1 和 2 获取当前主节点,进而定位最新闪存索引。
+
+TNC 是闪存索引的内存表示,包含未持久化的运行时属性(如脏标记)。TNC 作为回
+写式缓存,所有闪存索引修改都通过 TNC 完成。与其他缓存类似,TNC 无需将完整
+索引全部加载到内存中,需要时从闪存读取部分内容。 *提交* 是更新闪存文件系统
+结构(如索引)的 UBIFS 操作。每次提交时,标记为脏的 TNC 节点被写入闪存以更
+新持久化索引。
+
+
+日志
+~~~~
+
+为避免闪存磨损,索引仅在满足特定条件(如 ``fsync(2)`` )时才持久化(提交)。
+日志用于记录索引提交之间的所有变更(以 inode 节点、数据节点等形式)。挂载时
+从闪存读取日志并重放到 TNC(此时 TNC 按需从闪存索引创建)。
+
+UBIFS 保留一组专用于日志的 LEB(称为 *日志区* )。日志区 LEB 数量在文件系
+统创建时配置(使用 ``mkfs.ubifs`` )并存储于超级块节点。日志区仅含两类节
+点: *引用节点* 和 *提交起始节点* 。执行索引提交时写入提交起始节点,每次日
+志更新时写入引用节点。每个引用节点指向构成日志条目的其他节点( inode 节点、
+数据节点等)在闪存上的位置,这些节点称为 *bud* ,描述包含数据的实际文件系
+统变更。
+
+日志区以环形缓冲区维护。当日志将满时触发提交操作,同时写入提交起始节点。因此
+挂载时 UBIFS 查找最新提交起始节点,仅重放其后的引用节点。提交起始节点前的引
+用节点将被忽略(因其已属于闪存索引)。
+
+写入日志条目时,UBIFS 首先确保有足够空间写入引用节点和该条目的 bud。然后先
+写引用节点,再写描述文件变更的 bud。在日志重放阶段,UBIFS 会记录每个参考节
+点,并检查其引用的 LEB位置以定位 buds。若这些数据损坏或丢失,UBIFS 会尝试
+通过重新读取 LEB 来恢复,但仅针对日志中最后引用的 LEB,因为只有它可能因断
+电而损坏。若恢复失败,UBIFS 将拒绝挂载。对于其他 LEB 的错误,UBIFS 会直接
+终止挂载操作。
+
+::
+
+ | ---- LOG AREA ---- | ---------- MAIN AREA ------------ |
+
+ -----+------+-----+--------+---- ------+-----+-----+---------------
+ \ | | | | / / | | | \
+ / CS | REF | REF | | \ \ DENT | INO | INO | /
+ \ | | | | / / | | | \
+ ----+------+-----+--------+--- -------+-----+-----+----------------
+ | | ^ ^
+ | | | |
+ +------------------------+ |
+ | |
+ +-------------------------------+
+
+
+ 图2:包含提交起始节点(CS)和引用节点(REF)的日志区闪存布局,引用节点指向含
+ bud 的主区
+
+
+LEB属性树/表
+~~~~~~~~~~~~
+
+LEB 属性树用于存储每个 LEB 的信息,包括 LEB 类型、LEB 上的空闲空间和
+*脏空间* (旧空间,废弃内容) [1]_ 的数量。因为 UBIFS 从不在单个 LEB 混
+合存储索引节点和数据节点,所以 LEB 的类型至关重要,每个 LEB 都有特定用途,
+这对空闲空间计算非常有帮助。详见[UBIFS-WP]。
+
+LEB 属性树也是 B+ 树,但远小于索引。因为其体积小,所以每次提交时都整块写入,
+保存 LPT 是原子操作。
+
+
+.. [1] 由于LEB只能追加写入不能覆盖,空闲空间(即 LEB 剩余可写空间)与废弃
+ 内容(先前写入但未擦除前不能覆盖)存在区别。
+
+
+UBIFS认证
+=========
+
+本章介绍UBIFS认证,使UBIFS能验证闪存上元数据和文件内容的真实性与完整性。
+
+
+威胁模型
+--------
+
+UBIFS 认证可检测离线数据篡改。虽然不能防止篡改,但是能让(可信)代码检查闪
+存文件内容和文件系统元数据的完整性与真实性,也能检查文件内容被替换的攻击。
+
+UBIFS 认证不防护全闪存内容回滚(攻击者可转储闪存内容并在后期还原)。也不防护
+单个索引提交的部分回滚(攻击者能部分撤销变更)。这是因为 UBIFS 不立即覆盖索
+引树或日志的旧版本,而是标记为废弃,稍后由垃圾回收擦除。攻击者可擦除当前树部
+分内容并还原闪存上尚未擦除的旧版本。因每次提交总会写入索引根节点和主节点的新
+版本而不覆盖旧版本,UBI 的磨损均衡操作(将内容从物理擦除块复制到另一擦除块
+且非原子擦除原块)进一步助长此问题。
+
+UBIFS 认证不覆盖认证密钥提供后攻击者在设备执行代码的攻击,需结合安全启动和
+可信启动等措施确保设备仅执行可信代码。
+
+
+认证
+----
+
+为完全信任从闪存读取的数据,所有存储在闪存的 UBIFS 数据结构均需认证:
+- 包含文件内容、扩展属性、文件长度等元数据的索引
+- 通过记录文件系统变更来包含文件内容和元数据的日志
+- 存储 UBIFS 用于空闲空间统计的 UBI LEB 元数据的 LPT
+
+
+索引认证
+~~~~~~~~
+
+借助 *wandering tree* 概念,UBIFS 仅更新和持久化从叶节点到根节点的变更
+部分。这允许用子节点哈希增强索引树节点。最终索引基本成为 Merkle 树:因索引
+叶节点含实际文件系统数据,其父索引节点的哈希覆盖所有文件内容和元数据。文件
+变更时,UBIFS 索引从叶节点到根节点(含主节点)相应更新,此过程可挂钩以同步
+重新计算各变更节点的哈希。读取文件时,UBIFS 可从叶节点到根节点逐级验证哈希
+确保节点完整性。
+
+为确保整个索引真实性,UBIFS 主节点存储基于密钥的哈希(HMAC),覆盖自身内容及
+索引树根节点哈希。如前所述,主节点在索引持久化时(即索引提交时)总会写入闪存。
+
+此方法仅修改 UBIFS 索引节点和主节点以包含哈希,其他类型节点保持不变,减少了
+对 UBIFS 用户(如嵌入式设备)宝贵的存储开销。
+
+::
+
+ +---------------+
+ | Master Node |
+ | (hash) |
+ +---------------+
+ |
+ v
+ +-------------------+
+ | Index Node #1 |
+ | |
+ | branch0 branchn |
+ | (hash) (hash) |
+ +-------------------+
+ | ... | (fanout: 8)
+ | |
+ +-------+ +------+
+ | |
+ v v
+ +-------------------+ +-------------------+
+ | Index Node #2 | | Index Node #3 |
+ | | | |
+ | branch0 branchn | | branch0 branchn |
+ | (hash) (hash) | | (hash) (hash) |
+ +-------------------+ +-------------------+
+ | ... | ... |
+ v v v
+ +-----------+ +----------+ +-----------+
+ | Data Node | | INO Node | | DENT Node |
+ +-----------+ +----------+ +-----------+
+
+
+ 图3:索引节点哈希与主节点 HMAC 的覆盖范围
+
+
+
+健壮性性和断电安全性的关键在于以原子操作持久化哈希值与文件内容。UBIFS 现有
+的变更节点持久化机制专为此设计,能够确保断电时安全恢复。为索引节点添加哈希值
+不会改变该机制,因为每个哈希值都与其对应节点以原子操作同步持久化。
+
+
+日志认证
+~~~~~~~~
+
+日志也需要认证。因为日志持续写入,必须频繁地添加认证信息以确保断电时未认证数
+据量可控。方法是从提交起始节点开始,对先前引用节点、当前引用节点和 bud 节点
+创建连续哈希链。适时地在bud节点间插入认证节点,这种新节点类型包含哈希链当前
+状态的 HMAC。因此日志可认证至最后一个认证节点。日志尾部无认证节点的部分无法
+认证,在日志重放时跳过。
+
+日志认证示意图如下::
+
+ ,,,,,,,,
+ ,......,...........................................
+ ,. CS , hash1.----. hash2.----.
+ ,. | , . |hmac . |hmac
+ ,. v , . v . v
+ ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
+ ,..|...,...........................................
+ , | ,
+ , | ,,,,,,,,,,,,,,,
+ . | hash3,----.
+ , | , |hmac
+ , v , v
+ , REF#1 -> bud -> bud,-> auth ...
+ ,,,|,,,,,,,,,,,,,,,,,,
+ v
+ REF#2 -> ...
+ |
+ V
+ ...
+
+因为哈希值包含引用节点,攻击者无法重排或跳过日志头重放,仅能移除日志尾部的
+bud 节点或引用节点,最大限度将文件系统回退至上次提交。
+
+日志区位置存储于主节点。因为主节点通过 HMAC 认证,所以未经检测无法篡改。日
+志区大小在文件系统创建时由 `mkfs.ubifs` 指定并存储于超级块节点。为避免篡
+改此值及其他参数,超级块结构添加 HMAC。超级块节点存储在 LEB 0,仅在功能标
+志等变更时修改,文件变更时不修改。
+
+
+LPT认证
+~~~~~~~
+
+LPT 根节点在闪存上的位置存储于 UBIFS 主节点。因为 LPT 每次提交时都以原子
+操作写入和读取,无需单独认证树节点。通过主节点存储的简单哈希保护完整 LPT
+即可。因为主节点自身已认证,通过验证主节点真实性并比对存储的 LTP 哈希与读
+取的闪存 LPT 计算哈希值,即可验证 LPT 真实性。
+
+
+密钥管理
+--------
+
+为了简化实现,UBIFS 认证使用单一密钥计算超级块、主节点、提交起始节点和引用
+节点的 HMAC。创建文件系统(`mkfs.ubifs`) 时需提供此密钥以认证超级块节点。
+挂载文件系统时也需此密钥验证认证节点并为变更生成新 HMAC。
+
+UBIFS 认证旨在与 UBIFS 加密(fscrypt)协同工作以提供保密性和真实性。因为
+UBIFS 加密采用基于目录的差异化加密策略,可能存在多个 fscrypt 主密钥甚至未
+加密目录。而 UBIFS 认证采用全有或全无方式,要么认证整个文件系统要么完全不
+认证。基于此特性,且为确保认证机制可独立于加密功能使用,UBIFS 认证不与
+fscrypt 共享主密钥,而是维护独立的认证专用密钥。
+
+提供认证密钥的API尚未定义,但可通过类似 fscrypt 的用户空间密钥环提供。需注
+意当前 fscrypt 方案存在缺陷,用户空间 API 终将变更[FSCRYPT-POLICY2]。
+
+用户仍可通过用户空间提供单一口令或密钥覆盖 UBIFS 认证与加密。相应用户空间工
+具可解决此问题:除派生的 fscrypt 加密主密钥外,额外派生认证密钥。
+
+为检查挂载时密钥可用性,UBIFS 超级块节点将额外存储认证密钥的哈希。此方法类
+似 fscrypt 加密策略 v2 提出的方法[FSCRYPT-POLICY2]。
+
+
+未来扩展
+========
+
+特定场景下,若供应商需要向客户提供认证文件系统镜像,应该能在不共享 UBIFS 认
+证密钥的前提下实现。方法是在每个 HMAC 外额外存储数字签名,供应商随文件系统
+镜像分发公钥。若该文件系统后续需要修改,若后续需修改该文件系统,UBIFS 可在
+首次挂载时将全部数字签名替换为 HMAC,其处理逻辑与 IMA/EVM 子系统应对此类情
+况的方式类似。此时,HMAC 密钥需按常规方式预先提供。
+
+
+参考
+====
+
+[CRYPTSETUP2] https://www.saout.de/pipermail/dm-crypt/2017-November/005745.html
+
+[DMC-CBC-ATTACK] https://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-en
+crypted-luks-partitions/
+
+[DM-INTEGRITY] https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst
+
+[DM-VERITY] https://www.kernel.org/doc/Documentation/device-mapper/verity.rst
+
+[FSCRYPT-POLICY2] https://www.spinics.net/lists/linux-ext4/msg58710.html
+
+[UBIFS-WP] http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
diff --git a/Documentation/translations/zh_CN/filesystems/ubifs.rst b/Documentation/translations/zh_CN/filesystems/ubifs.rst
new file mode 100644
index 000000000000..d1873fc6a67c
--- /dev/null
+++ b/Documentation/translations/zh_CN/filesystems/ubifs.rst
@@ -0,0 +1,114 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/filesystems/ubifs.rst
+
+:翻译:
+
+ 邵明寅 Shao Mingyin <shao.mingyin@zte.com.cn>
+
+:校译:
+
+ 杨涛 yang tao <yang.tao172@zte.com.cn>
+
+============
+UBI 文件系统
+============
+
+简介
+====
+
+UBIFS 文件系统全称为 UBI 文件系统(UBI File System)。UBI 代表无序块镜
+像(Unsorted Block Images)。UBIFS 是一种闪存文件系统,这意味着它专为闪
+存设备设计。需要理解的是,UBIFS与 Linux 中任何传统文件系统(如 Ext2、
+XFS、JFS 等)完全不同。UBIFS 代表一类特殊的文件系统,它们工作在 MTD 设备
+而非块设备上。该类别的另一个 Linux 文件系统是 JFFS2。
+
+为更清晰说明,以下是 MTD 设备与块设备的简要比较:
+
+1. MTD 设备代表闪存设备,由较大尺寸的擦除块组成,通常约 128KiB。块设备由
+ 小块组成,通常 512 字节。
+2. MTD 设备支持 3 种主要操作:在擦除块内偏移位置读取、在擦除块内偏移位置写
+ 入、以及擦除整个擦除块。块设备支持 2 种主要操作:读取整个块和写入整个块。
+3. 整个擦除块必须先擦除才能重写内容。块可直接重写。
+4. 擦除块在经历一定次数的擦写周期后会磨损,通常 SLC NAND 和 NOR 闪存为
+ 100K-1G 次,MLC NAND 闪存为 1K-10K 次。块设备不具备磨损特性。
+5. 擦除块可能损坏(仅限 NAND 闪存),软件需处理此问题。硬盘上的块通常不会损
+ 坏,因为硬件有坏块替换机制(至少现代 LBA 硬盘如此)。
+
+这充分说明了 UBIFS 与传统文件系统的本质差异。
+
+UBIFS 工作在 UBI 层之上。UBI 是一个独立的软件层(位于 drivers/mtd/ubi),
+本质上是卷管理和磨损均衡层。它提供称为 UBI 卷的高级抽象,比 MTD 设备更上层。
+UBI 设备的编程模型与 MTD 设备非常相似,仍由大容量擦除块组成,支持读/写/擦
+除操作,但 UBI 设备消除了磨损和坏块限制(上述列表的第 4 和第 5 项)。
+
+某种意义上,UBIFS 是 JFFS2 文件系统的下一代产品,但它与 JFFS2 差异巨大且
+不兼容。主要区别如下:
+
+* JFFS2 工作在 MTD 设备之上,UBIFS 依赖于 UBI 并工作在 UBI 卷之上。
+* JFFS2 没有介质索引,需在挂载时构建索引,这要求全介质扫描。UBIFS 在闪存
+ 介质上维护文件系统索引信息,无需全介质扫描,因此挂载速度远快于 JFFS2。
+* JFFS2 是直写(write-through)文件系统,而 UBIFS 支持回写
+ (write-back),这使得 UBIFS 写入速度快得多。
+
+与 JFFS2 类似,UBIFS 支持实时压缩,可将大量数据存入闪存。
+
+与 JFFS2 类似,UBIFS 能容忍异常重启和断电。它不需要类似 fsck.ext2 的工
+具。UBIFS 会自动重放日志并从崩溃中恢复,确保闪存数据结构的一致性。
+
+UBIFS 具有对数级扩展性(其使用的数据结构多为树形),因此挂载时间和内存消耗不
+像 JFFS2 那样线性依赖于闪存容量。这是因为 UBIFS 在闪存介质上维护文件系统
+索引。但 UBIFS 依赖于线性扩展的 UBI 层,因此整体 UBI/UBIFS 栈仍是线性扩
+展。尽管如此,UBIFS/UBI 的扩展性仍显著优于 JFFS2。
+
+UBIFS 开发者认为,未来可开发同样具备对数级扩展性的 UBI2。UBI2 将支持与
+UBI 相同的 API,但二进制不兼容。因此 UBIFS 无需修改即可使用 UBI2。
+
+挂载选项
+========
+
+(*) 表示默认选项。
+
+==================== =======================================================
+bulk_read 批量读取以利用闪存介质的顺序读取加速特性
+no_bulk_read (*) 禁用批量读取
+no_chk_data_crc (*) 跳过数据节点的 CRC 校验以提高读取性能。 仅在闪存
+ 介质高度可靠时使用此选项。 此选项可能导致文件内容损坏无法被
+ 察觉。
+chk_data_crc 强制校验数据节点的 CRC
+compr=none 覆盖默认压缩器,设置为"none"
+compr=lzo 覆盖默认压缩器,设置为"LZO"
+compr=zlib 覆盖默认压缩器,设置为"zlib"
+auth_key= 指定用于文件系统身份验证的密钥。
+ 使用此选项将强制启用身份验证。
+ 传入的密钥必须存在于内核密钥环中, 且类型必须是'logon'
+auth_hash_name= 用于身份验证的哈希算法。同时用于哈希计算和 HMAC
+ 生成。典型值包括"sha256"或"sha512"
+==================== =======================================================
+
+快速使用指南
+============
+
+挂载的 UBI 卷通过 "ubiX_Y" 或 "ubiX:NAME" 语法指定,其中 "X" 是 UBI
+设备编号,"Y" 是 UBI 卷编号,"NAME" 是 UBI 卷名称。
+
+将 UBI 设备 0 的卷 0 挂载到 /mnt/ubifs::
+
+ $ mount -t ubifs ubi0_0 /mnt/ubifs
+
+将 UBI 设备 0 的 "rootfs" 卷挂载到 /mnt/ubifs("rootfs" 是卷名)::
+
+ $ mount -t ubifs ubi0:rootfs /mnt/ubifs
+
+以下是内核启动参数的示例,用于将 mtd0 附加到 UBI 并挂载 "rootfs" 卷:
+ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
+
+参考资料
+========
+
+UBIFS 文档及常见问题解答/操作指南请访问 MTD 官网:
+
+- http://www.linux-mtd.infradead.org/doc/ubifs.html
+- http://www.linux-mtd.infradead.org/faq/ubifs.html
diff --git a/Documentation/translations/zh_CN/how-to.rst b/Documentation/translations/zh_CN/how-to.rst
index ddd99c0f9b4d..7ae5d8765888 100644
--- a/Documentation/translations/zh_CN/how-to.rst
+++ b/Documentation/translations/zh_CN/how-to.rst
@@ -64,7 +64,7 @@ Linux 发行版和简单地使用 Linux 命令行,那么可以迅速开始了
::
cd linux
- ./scripts/sphinx-pre-install
+ ./tools/docs/sphinx-pre-install
以 Fedora 为例,它的输出是这样的::
@@ -437,7 +437,7 @@ git email 默认会抄送给您一份,所以您可以切换为审阅者的角
对于首次参与 Linux 内核中文文档翻译的新手,建议您在 linux 目录中运行以下命令:
::
- ./script/checktransupdate.py -l zh_CN``
+ tools/docs/checktransupdate.py -l zh_CN``
该命令会列出需要翻译或更新的英文文档,结果同时保存在 checktransupdate.log 中。
diff --git a/Documentation/translations/zh_CN/kbuild/kbuild.rst b/Documentation/translations/zh_CN/kbuild/kbuild.rst
index e5e2aebe1ebc..57f5cf5b2cdd 100644
--- a/Documentation/translations/zh_CN/kbuild/kbuild.rst
+++ b/Documentation/translations/zh_CN/kbuild/kbuild.rst
@@ -93,6 +93,16 @@ HOSTRUSTFLAGS
-------------
在构建主机程序时传递给 $(HOSTRUSTC) 的额外标志。
+PROCMACROLDFLAGS
+----------------
+用于链接 Rust 过程宏的标志。由于过程宏是由 rustc 在构建时加载的,
+因此必须以与当前使用的 rustc 工具链兼容的方式进行链接。
+
+例如,当 rustc 使用的 C 库与用户希望用于主机程序的 C 库不同时,
+此设置会非常有用。
+
+如果未设置,则默认使用链接主机程序时传递的标志。
+
HOSTLDFLAGS
-----------
链接主机程序时传递的额外选项。
@@ -135,12 +145,18 @@ KBUILD_OUTPUT
指定内核构建的输出目录。
在单独的构建目录中为预构建内核构建外部模块时,这个变量也可以指向内核输出目录。请注意,
-这并不指定外部模块本身的输出目录。
+这并不指定外部模块本身的输出目录(使用 KBUILD_EXTMOD_OUTPUT 来达到这个目的)。
输出目录也可以使用 "O=..." 指定。
设置 "O=..." 优先于 KBUILD_OUTPUT。
+KBUILD_EXTMOD_OUTPUT
+--------------------
+指定外部模块的输出目录
+
+设置 "MO=..." 优先于 KBUILD_EXTMOD_OUTPUT.
+
KBUILD_EXTRA_WARN
-----------------
指定额外的构建检查。也可以通过在命令行传递 "W=..." 来设置相同的值。
@@ -290,8 +306,13 @@ IGNORE_DIRS
KBUILD_BUILD_TIMESTAMP
----------------------
将该环境变量设置为日期字符串,可以覆盖在 UTS_VERSION 定义中使用的时间戳
-(运行内核时的 uname -v)。该值必须是一个可以传递给 date -d 的字符串。默认值是
-内核构建某个时刻的 date 命令输出。
+(运行内核时的 uname -v) 。该值必须是一个可以传递给 date -d 的字符串。例如::
+
+ $ KBUILD_BUILD_TIMESTAMP="Mon Oct 13 00:00:00 UTC 2025" make
+
+默认值是内核构建某个时刻的 date 命令输出。如果提供该时戳,它还用于任何 initramfs 归
+档文件中的 mtime 字段。 Initramfs mtimes 是 32 位的,因此早于 Unix 纪元 1970 年,或
+晚于协调世界时 (UTC) 2106 年 2 月 7 日 6 时 28 分 15 秒的日期是无效的。
KBUILD_BUILD_USER, KBUILD_BUILD_HOST
------------------------------------
diff --git a/Documentation/translations/zh_CN/mm/active_mm.rst b/Documentation/translations/zh_CN/mm/active_mm.rst
index b3352668c4c8..9496a0bb7d07 100644
--- a/Documentation/translations/zh_CN/mm/active_mm.rst
+++ b/Documentation/translations/zh_CN/mm/active_mm.rst
@@ -87,4 +87,4 @@ Active MM
最丑陋的之一--不像其他架构的MM和寄存器状态是分开的,alpha的PALcode将两者
连接起来,你需要同时切换两者)。
- (文档来源 http://marc.info/?l=linux-kernel&m=93337278602211&w=2)
+ (文档来源 https://lore.kernel.org/lkml/Pine.LNX.4.10.9907301410280.752-100000@penguin.transmeta.com/)
diff --git a/Documentation/translations/zh_CN/networking/generic-hdlc.rst b/Documentation/translations/zh_CN/networking/generic-hdlc.rst
new file mode 100644
index 000000000000..9e493dc9721e
--- /dev/null
+++ b/Documentation/translations/zh_CN/networking/generic-hdlc.rst
@@ -0,0 +1,176 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/networking/generic-hdlc.rst
+
+:翻译:
+
+ 孙渔喜 Sun yuxi <sun.yuxi@zte.com.cn>
+
+==========
+通用HDLC层
+==========
+
+Krzysztof Halasa <khc@pm.waw.pl>
+
+
+通用HDLC层当前支持以下协议:
+
+1. 帧中继(支持ANSI、CCITT、Cisco及无LMI模式)
+
+ - 常规(路由)接口和以太网桥接(以太网设备仿真)接口
+ 可共享同一条PVC。
+ - 支持ARP(内核暂不支持InARP,但可通过实验性用户空间守护程序实现,
+ 下载地址:http://www.kernel.org/pub/linux/utils/net/hdlc/)。
+
+2. 原始HDLC —— 支持IP(IPv4)接口或以太网设备仿真
+3. Cisco HDLC
+4. PPP
+5. X.25(使用X.25协议栈)
+
+通用HDLC仅作为协议驱动 - 必须配合具体硬件的底层驱动
+才能运行。
+
+以太网设备仿真(使用HDLC或帧中继PVC)兼容IEEE 802.1Q(VLAN)和
+802.1D(以太网桥接)。
+
+
+请确保已加载 hdlc.o 和硬件驱动程序。系统将为每个WAN端口创建一个
+"hdlc"网络设备(如hdlc0等)。您需要使用"sethdlc"工具,可从以下
+地址获取:
+
+ http://www.kernel.org/pub/linux/utils/net/hdlc/
+
+编译 sethdlc.c 工具::
+
+ gcc -O2 -Wall -o sethdlc sethdlc.c
+
+请确保使用与您内核版本匹配的 sethdlc 工具。
+
+使用 sethdlc 工具设置物理接口、时钟频率、HDLC 模式,
+若使用帧中继还需添加所需的 PVC。
+通常您需要执行类似以下命令::
+
+ sethdlc hdlc0 clock int rate 128000
+ sethdlc hdlc0 cisco interval 10 timeout 25
+
+或::
+
+ sethdlc hdlc0 rs232 clock ext
+ sethdlc hdlc0 fr lmi ansi
+ sethdlc hdlc0 create 99
+ ifconfig hdlc0 up
+ ifconfig pvc0 localIP pointopoint remoteIP
+
+在帧中继模式下,请先启用主hdlc设备(不分配IP地址),再
+使用pvc设备。
+
+
+接口设置选项:
+
+* v35 | rs232 | x21 | t1 | e1
+ - 当网卡支持软件可选接口时,可为指定端口设置物理接口
+ loopback
+ - 启用硬件环回(仅用于测试)
+* clock ext
+ - RX与TX时钟均使用外部时钟源
+* clock int
+ - RX与TX时钟均使用内部时钟源
+* clock txint
+ - RX时钟使用外部时钟源,TX时钟使用内部时钟源
+* clock txfromrx
+ - RX时钟使用外部时钟源,TX时钟从RX时钟派生
+* rate
+ - 设置时钟速率(仅适用于"int"或"txint"时钟模式)
+
+
+设置协议选项:
+
+* hdlc - 设置原始HDLC模式(仅支持IP协议)
+
+ nrz / nrzi / fm-mark / fm-space / manchester - 传输编码选项
+
+ no-parity / crc16 / crc16-pr0 (预设零值的CRC16) / crc32-itu
+
+ crc16-itu (使用ITU-T多项式的CRC16) / crc16-itu-pr0 - 校验方式选项
+
+* hdlc-eth - 使用HDLC进行以太网设备仿真. 校验和编码方式同上
+ as above.
+
+* cisco - 设置Cisco HDLC模式(支持IP、IPv6和IPX协议)
+
+ interval - 保活数据包发送间隔(秒)
+
+ timeout - 未收到保活数据包的超时时间(秒),超过此时长将判定
+ 链路断开
+
+* ppp - 设置同步PPP模式
+
+* x25 - 设置X.25模式
+
+* fr - 帧中继模式
+
+ lmi ansi / ccitt / cisco / none - LMI(链路管理)类型
+
+ dce - 将帧中继设置为DCE(网络侧)LMI模式(默认为DTE用户侧)。
+
+ 此设置与时钟无关!
+
+ - t391 - 链路完整性验证轮询定时器(秒)- 用户侧
+ - t392 - 轮询验证定时器(秒)- 网络侧
+ - n391 - 全状态轮询计数器 - 用户侧
+ - n392 - 错误阈值 - 用户侧和网络侧共用
+ - n393 - 监控事件计数 - 用户侧和网络侧共用
+
+帧中继专用命令:
+
+* create n | delete n - 添加/删除DLCI编号为n的PVC接口。
+ 新创建的接口将命名为pvc0、pvc1等。
+
+* create ether n | delete ether n - 添加/删除用于以太网
+ 桥接帧的设备设备将命名为pvceth0、pvceth1等。
+
+
+
+
+板卡特定问题
+------------
+
+n2.o 和 c101.o 驱动模块需要参数才能工作::
+
+ insmod n2 hw=io,irq,ram,ports[:io,irq,...]
+
+示例::
+
+ insmod n2 hw=0x300,10,0xD0000,01
+
+或::
+
+ insmod c101 hw=irq,ram[:irq,...]
+
+示例::
+
+ insmod c101 hw=9,0xdc000
+
+若直接编译进内核,这些驱动需要通过内核(命令行)参数配置::
+
+ n2.hw=io,irq,ram,ports:...
+
+或::
+
+ c101.hw=irq,ram:...
+
+
+
+若您的N2、C101或PLX200SYN板卡出现问题,可通过"private"
+命令查看端口数据包描述符环(显示在内核日志中)
+
+ sethdlc hdlc0 private
+
+硬件驱动需使用#define DEBUG_RINGS编译选项构建。
+在提交错误报告时附上这些信息将很有帮助。如在使用过程中遇
+到任何问题,请随时告知。
+
+获取补丁和其他信息,请访问:
+<http://www.kernel.org/pub/linux/utils/net/hdlc/>. \ No newline at end of file
diff --git a/Documentation/translations/zh_CN/networking/index.rst b/Documentation/translations/zh_CN/networking/index.rst
index bb0edcffd144..c276c0993c51 100644
--- a/Documentation/translations/zh_CN/networking/index.rst
+++ b/Documentation/translations/zh_CN/networking/index.rst
@@ -27,6 +27,9 @@
xfrm_proc
netmem
alias
+ mptcp-sysctl
+ generic-hdlc
+ timestamping
Todolist:
@@ -76,7 +79,6 @@ Todolist:
* eql
* fib_trie
* filter
-* generic-hdlc
* generic_netlink
* netlink_spec/index
* gen_stats
@@ -96,7 +98,6 @@ Todolist:
* mctp
* mpls-sysctl
* mptcp
-* mptcp-sysctl
* multiqueue
* multi-pf-netdev
* net_cachelines/index
@@ -126,7 +127,6 @@ Todolist:
* sctp
* secid
* seg6-sysctl
-* skbuff
* smc-sysctl
* sriov
* statistics
@@ -138,7 +138,6 @@ Todolist:
* tcp_ao
* tcp-thin
* team
-* timestamping
* tipc
* tproxy
* tuntap
diff --git a/Documentation/translations/zh_CN/networking/mptcp-sysctl.rst b/Documentation/translations/zh_CN/networking/mptcp-sysctl.rst
new file mode 100644
index 000000000000..0b1b9ed7c647
--- /dev/null
+++ b/Documentation/translations/zh_CN/networking/mptcp-sysctl.rst
@@ -0,0 +1,139 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/networking/mptcp-sysctl.rst
+
+:翻译:
+
+ 孙渔喜 Sun yuxi <sun.yuxi@zte.com.cn>
+
+================
+MPTCP Sysfs 变量
+================
+
+/proc/sys/net/mptcp/* Variables
+===============================
+
+add_addr_timeout - INTEGER (秒)
+ 设置ADD_ADDR控制消息的重传超时时间。当MPTCP对端未确认
+ 先前的ADD_ADDR消息时,将在该超时时间后重新发送。
+
+ 默认值与TCP_RTO_MAX相同。此为每个命名空间的sysctl参数。
+
+ 默认值:120
+
+allow_join_initial_addr_port - BOOLEAN
+ 控制是否允许对端向初始子流使用的IP地址和端口号发送加入
+ 请求(1表示允许)。此参数会设置连接时发送给对端的标志位,
+ 并决定是否接受此类加入请求。
+
+ 通过ADD_ADDR通告的地址不受此参数影响。
+
+ 此为每个命名空间的sysctl参数。
+
+ 默认值:1
+
+available_path_managers - STRING
+ 显示已注册的可用路径管理器选项。可能有更多路径管理器可用
+ 但尚未加载。
+
+available_schedulers - STRING
+ 显示已注册的可用调度器选项。可能有更多数据包调度器可用
+ 但尚未加载。
+
+blackhole_timeout - INTEGER (秒)
+ 当发生MPTCP防火墙黑洞问题时,初始禁用活跃MPTCP套接字上MPTCP
+ 功能的时间(秒)。如果在重新启用MPTCP后立即检测到更多黑洞问题,
+ 此时间段将呈指数增长;当黑洞问题消失时,将重置为初始值。
+
+ 设置为0可禁用黑洞检测功能。此为每个命名空间的sysctl参数。
+
+ 默认值:3600
+
+checksum_enabled - BOOLEAN
+ 控制是否启用DSS校验和功能。
+
+ 当值为非零时可启用DSS校验和。此为每个命名空间的sysctl参数。
+
+ 默认值:0
+
+close_timeout - INTEGER (seconds)
+ 设置"先断后连"超时时间:在未调用close或shutdown系统调用时,
+ MPTCP套接字将在最后一个子流移除后保持当前状态达到该时长,才
+ 会转为TCP_CLOSE状态。
+
+ 默认值与TCP_TIMEWAIT_LEN相同。此为每个命名空间的sysctl参数。
+
+ 默认值:60
+
+enabled - BOOLEAN
+ 控制是否允许创建MPTCP套接字。
+
+ 当值为1时允许创建MPTCP套接字。此为每个命名空间的sysctl参数。
+
+ 默认值:1(启用)
+
+path_manager - STRING
+ 设置用于每个新MPTCP套接字的默认路径管理器名称。内核路径管理将
+ 根据通过MPTCP netlink API配置的每个命名空间值来控制子流连接
+ 和地址通告。用户空间路径管理将每个MPTCP连接的子流连接决策和地
+ 址通告交由特权用户空间程序控制,代价是需要更多netlink流量来
+ 传播所有相关事件和命令。
+
+ 此为每个命名空间的sysctl参数。
+
+ * "kernel" - 内核路径管理器
+ * "userspace" - 用户空间路径管理器
+
+ 默认值:"kernel"
+
+pm_type - INTEGER
+ 设置用于每个新MPTCP套接字的默认路径管理器类型。内核路径管理将
+ 根据通过MPTCP netlink API配置的每个命名空间值来控制子流连接
+ 和地址通告。用户空间路径管理将每个MPTCP连接的子流连接决策和地
+ 址通告交由特权用户空间程序控制,代价是需要更多netlink流量来
+ 传播所有相关事件和命令。
+
+ 此为每个命名空间的sysctl参数。
+
+ 自v6.15起已弃用,请改用path_manager参数。
+
+ * 0 - 内核路径管理器
+ * 1 - 用户空间路径管理器
+
+ 默认值:0
+
+scheduler - STRING
+ 选择所需的调度器类型。
+
+ 支持选择不同的数据包调度器。此为每个命名空间的sysctl参数。
+
+ 默认值:"default"
+
+stale_loss_cnt - INTEGER
+ 用于判定子流失效(stale)的MPTCP层重传间隔次数阈值。当指定
+ 子流在连续多个重传间隔内既无数据传输又有待处理数据时,将被标
+ 记为失效状态。失效子流将被数据包调度器忽略。
+ 设置较低的stale_loss_cnt值可实现快速主备切换,较高的值则能
+ 最大化边缘场景(如高误码率链路或对端暂停数据处理等异常情况)
+ 的链路利用率。
+
+ 此为每个命名空间的sysctl参数。
+
+ 默认值:4
+
+syn_retrans_before_tcp_fallback - INTEGER
+ 在回退到 TCP(即丢弃 MPTCP 选项)之前,SYN + MP_CAPABLE
+ 报文的重传次数。换句话说,如果所有报文在传输过程中都被丢弃,
+ 那么将会:
+
+ * 首次SYN携带MPTCP支持选项
+ * 按本参数值重传携带MPTCP选项的SYN包
+ * 后续重传将不再携带MPTCP支持选项
+
+ 0 表示首次重传即丢弃MPTCP选项。
+ >=128 表示所有SYN重传均保留MPTCP选项设置过低的值可能增加
+ MPTCP黑洞误判几率。此为每个命名空间的sysctl参数。
+
+ 默认值:2
diff --git a/Documentation/translations/zh_CN/networking/timestamping.rst b/Documentation/translations/zh_CN/networking/timestamping.rst
new file mode 100644
index 000000000000..4593f53ad09a
--- /dev/null
+++ b/Documentation/translations/zh_CN/networking/timestamping.rst
@@ -0,0 +1,674 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/networking/timestamping.rst
+
+:翻译:
+
+ 王亚鑫 Wang Yaxin <wang.yaxin@zte.com.cn>
+
+======
+时间戳
+======
+
+
+1. 控制接口
+===========
+
+接收网络数据包时间戳的接口包括:
+
+SO_TIMESTAMP
+ 为每个传入数据包生成(不一定是单调的)系统时间时间戳。通过 recvmsg()
+ 在控制消息中以微秒分辨率报告时间戳。
+ SO_TIMESTAMP 根据架构类型和 libc 的 lib 中的 time_t 表示方式定义为
+ SO_TIMESTAMP_NEW 或 SO_TIMESTAMP_OLD。
+ SO_TIMESTAMP_OLD 和 SO_TIMESTAMP_NEW 的控制消息格式分别为
+ struct __kernel_old_timeval 和 struct __kernel_sock_timeval。
+
+SO_TIMESTAMPNS
+ 与 SO_TIMESTAMP 相同的时间戳机制,但以 struct timespec 格式报告时间戳,
+ 纳秒分辨率。
+ SO_TIMESTAMPNS 根据架构类型和 libc 的 time_t 表示方式定义为
+ SO_TIMESTAMPNS_NEW 或 SO_TIMESTAMPNS_OLD。
+ 控制消息格式对于 SO_TIMESTAMPNS_OLD 为 struct timespec,
+ 对于 SO_TIMESTAMPNS_NEW 为 struct __kernel_timespec。
+
+IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
+ 仅用于多播:通过读取回环数据包接收时间戳,获得近似的传输时间戳。
+
+SO_TIMESTAMPING
+ 在接收、传输或两者时生成时间戳。支持多个时间戳源,包括硬件。
+ 支持为流套接字生成时间戳。
+
+
+1.1 SO_TIMESTAMP(也包括 SO_TIMESTAMP_OLD 和 SO_TIMESTAMP_NEW)
+---------------------------------------------------------------
+
+此套接字选项在接收路径上启用数据报的时间戳。由于目标套接字(如果有)
+在网络栈早期未知,因此必须为所有数据包启用此功能。所有早期接收的时间
+戳选项也是如此。
+
+有关接口详细信息,请参阅 `man 7 socket`。
+
+始终使用 SO_TIMESTAMP_NEW 时间戳以获得 struct __kernel_sock_timeval
+格式的时间戳。
+
+如果时间在 2038 年后,SO_TIMESTAMP_OLD 在 32 位机器上将返回错误的时间戳。
+
+1.2 SO_TIMESTAMPNS(也包括 SO_TIMESTAMPNS_OLD 和 SO_TIMESTAMPNS_NEW)
+---------------------------------------------------------------------
+
+此选项与 SO_TIMESTAMP 相同,但返回数据类型有所不同。其 struct timespec
+能达到比 SO_TIMESTAMP 的 timeval(毫秒)更高的分辨率(纳秒)时间戳。
+
+始终使用 SO_TIMESTAMPNS_NEW 时间戳获得 struct __kernel_timespec 格式
+的时间戳。
+
+如果时间在 2038 年后,SO_TIMESTAMPNS_OLD 在 32 位机器上将返回错误的时间戳。
+
+1.3 SO_TIMESTAMPING(也包括 SO_TIMESTAMPING_OLD 和 SO_TIMESTAMPING_NEW)
+------------------------------------------------------------------------
+
+支持多种类型的时间戳请求。因此,此套接字选项接受标志位图,而不是布尔值。在::
+
+ err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
+
+val 是一个整数,设置了以下任何位。设置其他位将返回 EINVAL 且不更改当前状态。
+
+这个套接字选项配置以下几个方面的时间戳生成:
+为单个 sk_buff 结构体生成时间戳(1.3.1);
+将时间戳报告到套接字的错误队列(1.3.2);
+配置相关选项(1.3.3);
+也可以通过 cmsg 为单个 sendmsg 调用启用时间戳生成(1.3.4)。
+
+1.3.1 时间戳生成
+^^^^^^^^^^^^^^^^
+
+某些位是向协议栈请求尝试生成时间戳。它们的任何组合都是有效的。对这些位的更改适
+用于新创建的数据包,而不是已经在协议栈中的数据包。因此,可以通过在两个 setsockopt
+调用之间嵌入 send() 调用来选择性地为数据包子集请求时间戳(例如,用于采样),
+一个用于启用时间戳生成,一个用于禁用它。时间戳也可能由于特定套接字请求之外的原
+因而生成,例如在当系统范围内启用接收时间戳时,如前所述。
+
+SOF_TIMESTAMPING_RX_HARDWARE:
+ 请求由网络适配器生成的接收时间戳。
+
+SOF_TIMESTAMPING_RX_SOFTWARE:
+ 当数据进入内核时请求接收时间戳。这些时间戳在设备驱动程序将数据包交给内核接收
+ 协议栈后生成。
+
+SOF_TIMESTAMPING_TX_HARDWARE:
+ 请求由网络适配器生成的传输时间戳。此标志可以通过套接字选项和控制消息启用。
+
+SOF_TIMESTAMPING_TX_SOFTWARE:
+ 当数据离开内核时请求传输(TX)时间戳。这些时间戳由设备驱动程序生成,并且尽
+ 可能贴近网络接口发送点,但始终在内核将数据包传递给网络接口之前生成。因此,
+ 它们需要驱动程序支持,且可能并非所有设备都可用。此标志可通过套接字选项和
+ 控制消息两种方式启用。
+
+SOF_TIMESTAMPING_TX_SCHED:
+ 在进入数据包调度器之前请求传输时间戳。内核传输延迟(如果很长)通常由排队
+ 延迟主导。此时间戳与在 SOF_TIMESTAMPING_TX_SOFTWARE 处获取的时间戳之
+ 间的差异将暴露此延迟,并且与协议处理无关。协议处理中产生的延迟(如果有)
+ 可以通过从 send() 之前立即获取的用户空间时间戳中减去此时间戳来计算。在
+ 具有虚拟设备的机器上,传输的数据包通过多个设备和多个数据包调度器,在每层
+ 生成时间戳。这允许对排队延迟进行细粒度测量。此标志可以通过套接字选项和控
+ 制消息启用。
+
+SOF_TIMESTAMPING_TX_ACK:
+ 请求在发送缓冲区中的所有数据都已得到确认时生成传输(TX)时间戳。此选项
+ 仅适用于可靠协议,目前仅在TCP协议中实现。对于该协议,它可能会过度报告
+ 测量结果,因为时间戳是在send()调用时缓冲区中的所有数据(包括该缓冲区)
+ 都被确认时生成的,即累积确认。该机制会忽略选择确认(SACK)和前向确认
+ (FACK)。此标志可通过套接字选项和控制消息两种方式启用。
+
+SOF_TIMESTAMPING_TX_COMPLETION:
+ 在数据包传输完成时请求传输时间戳。完成时间戳由内核在从硬件接收数据包完成
+ 报告时生成。硬件可能一次报告多个数据包,完成时间戳反映报告的时序而不是实
+ 际传输时间。此标志可以通过套接字选项和控制消息启用。
+
+
+1.3.2 时间戳报告
+^^^^^^^^^^^^^^^^
+
+其他三个位控制将在生成的控制消息中报告哪些时间戳。对这些位的更改在协议栈中
+的时间戳报告位置立即生效。仅当数据包设置了相关的时间戳生成请求时,才会报告
+其时间戳。
+
+SOF_TIMESTAMPING_SOFTWARE:
+ 在可用时报告任何软件时间戳。
+
+SOF_TIMESTAMPING_SYS_HARDWARE:
+ 此选项已被弃用和忽略。
+
+SOF_TIMESTAMPING_RAW_HARDWARE:
+ 在可用时报告由 SOF_TIMESTAMPING_TX_HARDWARE 或 SOF_TIMESTAMPING_RX_HARDWARE
+ 生成的硬件时间戳。
+
+
+1.3.3 时间戳选项
+^^^^^^^^^^^^^^^^
+
+接口支持以下选项
+
+SOF_TIMESTAMPING_OPT_ID:
+ 每个数据包生成一个唯一标识符。一个进程可以同时存在多个未完成的时间戳请求。
+ 数据包在传输路径中可能会发生重排序(例如在数据包调度器中)。在这种情况下,
+ 时间戳会以与原始send()调用不同的顺序排队到错误队列中。如此一来,仅根据
+ 时间戳顺序或 payload(有效载荷)检查,并不总能将时间戳与原始send()调用
+ 唯一匹配。
+
+ 此选项在 send() 时将每个数据包与唯一标识符关联,并与时间戳一起返回。
+ 标识符源自每个套接字的 u32 计数器(会回绕)。对于数据报套接字,计数器
+ 随每个发送的数据包递增。对于流套接字,它随每个字节递增。对于流套接字,
+ 还要设置 SOF_TIMESTAMPING_OPT_ID_TCP,请参阅下面的部分。
+
+ 计数器从零开始。在首次启用套接字选项时初始化。在禁用后再重新启用选项时
+ 重置。重置计数器不会更改系统中现有数据包的标识符。
+
+ 此选项仅针对传输时间戳实现。在这种情况下,时间戳总是与sock_extended_err
+ 结构体一起回环。该选项会修改ee_data字段,以传递一个在该套接字所有同时
+ 存在的未完成时间戳请求中唯一的 ID。
+
+ 进程可以通过控制消息SCM_TS_OPT_ID(TCP 套接字不支持)传递特定 ID,
+ 从而选择性地覆盖默认生成的 ID,示例如下::
+
+ struct msghdr *msg;
+ ...
+ cmsg = CMSG_FIRSTHDR(msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_TS_OPT_ID;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
+ *((__u32 *) CMSG_DATA(cmsg)) = opt_id;
+ err = sendmsg(fd, msg, 0);
+
+
+SOF_TIMESTAMPING_OPT_ID_TCP:
+ 与 SOF_TIMESTAMPING_OPT_ID 一起传递给新的 TCP 时间戳应用程序。
+ SOF_TIMESTAMPING_OPT_ID 定义了流套接字计数器的增量,但其起始点
+ 并不完全显而易见。此选项修复了这一点。
+
+ 对于流套接字,如果设置了 SOF_TIMESTAMPING_OPT_ID,则此选项应始终
+ 设置。在数据报套接字上,选项没有效果。
+
+ 一个合理的期望是系统调用后计数器重置为零,因此后续写入 N 字节将生成
+ 计数器为 N-1 的时间戳。SOF_TIMESTAMPING_OPT_ID_TCP 在所有条件下
+ 都实现了此行为。
+
+ SOF_TIMESTAMPING_OPT_ID 不带修饰符时通常报告相同,特别是在套接字选项
+ 在无数据传输时设置时。如果正在传输数据,它可能与输出队列的长度(SIOCOUTQ)
+ 偏差。
+
+ 差异是由于基于 snd_una 与 write_seq 的。snd_una 是 peer 确认的 stream
+ 的偏移量。这取决于外部因素,例如网络 RTT。write_seq 是进程写入的最后一个
+ 字节。此偏移量不受外部输入影响。
+
+ 差异细微,在套接字选项初始化时配置时不易察觉,但 SOF_TIMESTAMPING_OPT_ID_TCP
+ 行为在任何时候都更稳健。
+
+SOF_TIMESTAMPING_OPT_CMSG:
+ 支持所有时间戳数据包的 recv() cmsg。控制消息已无条件地在所有接收时间戳数据包
+ 和 IPv6 数据包上支持,以及在发送时间戳数据包的 IPv4 数据包上支持。此选项扩展
+ 了它们以在发送时间戳数据包的 IPv4 数据包上支持。一个用例是启用 socket 选项
+ IP_PKTINFO 以关联数据包与其出口设备,通过启用 socket 选项 IP_PKTINFO 同时。
+
+
+SOF_TIMESTAMPING_OPT_TSONLY:
+ 仅适用于传输时间戳。使内核返回一个 cmsg 与一个空数据包一起,而不是与原
+ 始数据包一起。这减少了套接字接收预算(SO_RCVBUF)中收取的内存量,并即使
+ 在 sysctl net.core.tstamp_allow_data 为 0 时也提供时间戳。此选项禁用
+ SOF_TIMESTAMPING_OPT_CMSG。
+
+SOF_TIMESTAMPING_OPT_STATS:
+ 与传输时间戳一起获取的选项性统计信息。它必须与 SOF_TIMESTAMPING_OPT_TSONLY
+ 一起使用。当传输时间戳可用时,统计信息可在类型为 SCM_TIMESTAMPING_OPT_STATS
+ 的单独控制消息中获取,作为 TLV(struct nlattr)类型的列表。这些统计信息允许应
+ 用程序将各种传输层统计信息与传输时间戳关联,例如某个数据块被 peer 的接收窗口限
+ 制了多长时间。
+
+SOF_TIMESTAMPING_OPT_PKTINFO:
+ 启用 SCM_TIMESTAMPING_PKTINFO 控制消息以接收带有硬件时间戳的数据包。
+ 消息包含 struct scm_ts_pktinfo,它提供接收数据包的实际接口索引和层 2 长度。
+ 只有在 CONFIG_NET_RX_BUSY_POLL 启用且驱动程序使用 NAPI 时,才会返回非零的
+ 有效接口索引。该结构还包含另外两个字段,但它们是保留字段且未定义。
+
+SOF_TIMESTAMPING_OPT_TX_SWHW:
+ 请求在 SOF_TIMESTAMPING_TX_HARDWARE 和 SOF_TIMESTAMPING_TX_SOFTWARE
+ 同时启用时,为传出数据包生成硬件和软件时间戳。如果同时生成两个时间戳,两个单
+ 独的消息将回环到套接字的错误队列,每个消息仅包含一个时间戳。
+
+SOF_TIMESTAMPING_OPT_RX_FILTER:
+ 过滤掉虚假接收时间戳:仅当匹配的时间戳生成标志已启用时才报告接收时间戳。
+
+ 接收时间戳在入口路径中生成较早,在数据包的目的套接字确定之前。如果任何套接
+ 字启用接收时间戳,所有套接字的数据包将接收时间戳数据包。包括那些请求时间戳
+ 报告与 SOF_TIMESTAMPING_SOFTWARE 和/或 SOF_TIMESTAMPING_RAW_HARDWARE,
+ 但未请求接收时间戳生成。这可能发生在仅请求发送时间戳时。
+
+ 接收虚假时间戳通常是无害的。进程可以忽略意外的非零值。但它使行为在其他套接
+ 字上微妙地依赖。此标志隔离套接字以获得更确定的行为。
+
+新应用程序鼓励传递 SOF_TIMESTAMPING_OPT_ID 以区分时间戳并传递
+SOF_TIMESTAMPING_OPT_TSONLY 以操作,而不管 sysctl net.core.tstamp_allow_data
+的设置。
+
+例外情况是当进程需要额外的 cmsg 数据时,例如 SOL_IP/IP_PKTINFO 以检测出
+口网络接口。然后传递选项 SOF_TIMESTAMPING_OPT_CMSG。此选项依赖于访问原
+始数据包的内容,因此不能与 SOF_TIMESTAMPING_OPT_TSONLY 组合。
+
+
+1.3.4. 通过控制消息启用时间戳
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+除了套接字选项外,时间戳生成还可以通过 cmsg 按写入请求,仅适用于
+SOF_TIMESTAMPING_TX_*(见第 1.3.1 节)。使用此功能,应用程序可以无需启用和
+禁用时间戳即可采样每个 sendmsg() 的时间戳::
+
+ struct msghdr *msg;
+ ...
+ cmsg = CMSG_FIRSTHDR(msg);
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SO_TIMESTAMPING;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
+ *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED |
+ SOF_TIMESTAMPING_TX_SOFTWARE |
+ SOF_TIMESTAMPING_TX_ACK;
+ err = sendmsg(fd, msg, 0);
+
+通过 cmsg 设置的 SOF_TIMESTAMPING_TX_* 标志将覆盖通过 setsockopt 设置的
+SOF_TIMESTAMPING_TX_* 标志。
+
+此外,应用程序仍然需要通过 setsockopt 启用时间戳报告以接收时间戳::
+
+ __u32 val = SOF_TIMESTAMPING_SOFTWARE |
+ SOF_TIMESTAMPING_OPT_ID /* 或任何其他标志 */;
+ err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val));
+
+
+1.4 字节流时间戳
+----------------
+
+SO_TIMESTAMPING 接口支持字节流的时间戳。每个请求解释为请求当整个缓冲区内容
+通过时间戳点时。也就是说,对于流选项 SOF_TIMESTAMPING_TX_SOFTWARE 将记录
+当所有字节都到达设备驱动程序时,无论数据被转换成多少个数据包。
+
+一般来说,字节流没有自然分隔符,因此将时间戳与数据相关联是非平凡的。字节范围
+可能跨段,任何段可能合并(可能合并先前分段缓冲区关联的独立 send() 调用)。段
+可以重新排序,同一字节范围可以在多个段中并存,对于实现重传的协议。
+
+所有时间戳必须实现相同的语义,否则它们是不可比较的。以不同于简单情况(缓冲区
+到 skb 的 1:1 映射)的方式处理“罕见”角落情况是不够的,因为性能调试通常需要
+关注这些异常。
+
+在实践中,时间戳可以与字节流段一致地关联,如果时间戳语义和测量时序的选择正确。
+此挑战与决定 IP 分片策略没有不同。在那里,定义是仅对第一个分片进行时间戳。对
+于字节流,我们选择仅在所有字节通过某个点时生成时间戳。SOF_TIMESTAMPING_TX_ACK
+定义的实现和推理是容易的。一个需要考虑 SACK 的实现会更复杂,因为可能存在传输
+空洞和乱序到达。
+
+在主机上,TCP 也可以通过 Nagle、cork、autocork、分段和 GSO 打破简单的 1:1
+缓冲区到 skbuff 映射。实现确保在所有情况下都正确,通过跟踪每个 send() 传递
+给send() 的最后一个字节,即使它在 skbuff 扩展或合并操作后不再是最后一个字
+节。它存储相关的序列号在 skb_shinfo(skb)->tskey。因为一个 skbuff 只有一
+个这样的字段,所以只能生成一个时间戳。
+
+在罕见情况下,如果两个请求折叠到同一个 skb,则时间戳请求可能会被错过。进程可
+以通过始终在请求之间刷新 TCP 栈来检测此情况,例如启用 TCP_NODELAY 和禁用
+TCP_CORK和 autocork。在 linux-4.7 之后,更好的预防合并方法是使用 MSG_EOR
+标志在sendmsg()时。
+
+这些预防措施确保时间戳仅在所有字节通过时间戳点时生成,假设网络栈本身不会重新
+排序段。栈确实试图避免重新排序。唯一的例外是管理员控制:可以构造一个数据包调
+度器配置,将来自同一流的不同段延迟不同。这种设置通常不常见。
+
+
+2 数据接口
+==========
+
+时间戳通过 recvmsg() 的辅助数据功能读取。请参阅 `man 3 cmsg` 了解此接口的
+详细信息。套接字手册页面 (`man 7 socket`) 描述了如何检索SO_TIMESTAMP 和
+SO_TIMESTAMPNS 生成的数据包时间戳。
+
+
+2.1 SCM_TIMESTAMPING 记录
+-------------------------
+
+这些时间戳在 cmsg_level SOL_SOCKET、cmsg_type SCM_TIMESTAMPING 和类型为
+
+对于 SO_TIMESTAMPING_OLD::
+
+ struct scm_timestamping {
+ struct timespec ts[3];
+ };
+
+对于 SO_TIMESTAMPING_NEW::
+
+ struct scm_timestamping64 {
+ struct __kernel_timespec ts[3];
+
+始终使用 SO_TIMESTAMPING_NEW 时间戳以始终获得 struct scm_timestamping64
+格式的时间戳。
+
+SO_TIMESTAMPING_OLD 在 32 位机器上 2038 年后返回错误的时间戳。
+
+该结构可以返回最多三个时间戳。这是一个遗留功能。任何时候至少有一个字
+段不为零。大多数时间戳都通过 ts[0] 传递。硬件时间戳通过 ts[2] 传递。
+
+ts[1] 以前用于存储硬件时间戳转换为系统时间。相反,将硬件时钟设备直接
+暴露为HW PTP时钟源,以允许用户空间进行时间转换,并可选地与用户空间
+PTP 堆栈(如linuxptp)同步系统时间。对于 PTP 时钟 API,请参阅
+Documentation/driver-api/ptp.rst。
+
+注意,如果同时启用了 SO_TIMESTAMP 或 SO_TIMESTAMPNS 与
+SO_TIMESTAMPING 使用 SOF_TIMESTAMPING_SOFTWARE,在 recvmsg()
+调用时会生成一个虚假的软件时间戳,并传递给 ts[0] 当真实软件时间戳缺
+失时。这也发生在硬件传输时间戳上。
+
+2.1.1 传输时间戳与 MSG_ERRQUEUE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+对于传输时间戳,传出数据包回环到套接字的错误队列,并附加发送时间戳(s)。
+进程通过调用带有 MSG_ERRQUEUE 标志的 recvmsg() 接收时间戳,并传递
+一个足够大的 msg_control缓冲区以接收相关的元数据结构。recvmsg 调用
+返回原始传出数据包,并附加两个辅助消息。
+
+一个 cm_level SOL_IP(V6) 和 cm_type IP(V6)_RECVERR 嵌入一个
+struct sock_extended_err这定义了错误类型。对于时间戳,ee_errno
+字段是 ENOMSG。另一个辅助消息将具有 cm_level SOL_SOCKET 和 cm_type
+SCM_TIMESTAMPING。这嵌入了 struct scm_timestamping。
+
+
+2.1.1.2 时间戳类型
+~~~~~~~~~~~~~~~~~~
+
+三个 struct timespec 的语义由 struct sock_extended_err 中的
+ee_info 字段定义。它包含一个类型 SCM_TSTAMP_* 来定义实际传递给
+scm_timestamping 的时间戳。
+
+SCM_TSTAMP_* 类型与之前讨论的 SOF_TIMESTAMPING_* 控制字段完全
+匹配,只有一个例外对于遗留原因,SCM_TSTAMP_SND 等于零,可以设置为
+SOF_TIMESTAMPING_TX_HARDWARE 和 SOF_TIMESTAMPING_TX_SOFTWARE。
+它是第一个,如果 ts[2] 不为零,否则是第二个,在这种情况下,时间戳存
+储在ts[0] 中。
+
+
+2.1.1.3 分片
+~~~~~~~~~~~~
+
+传出数据报分片很少见,但可能发生,例如通过显式禁用 PMTU 发现。如果
+传出数据包被分片,则仅对第一个分片进行时间戳,并返回给发送套接字。
+
+
+2.1.1.4 数据包负载
+~~~~~~~~~~~~~~~~~~
+
+调用应用程序通常不关心接收它传递给堆栈的整个数据包负载:套接字错误队
+列机制仅是一种将时间戳附加到其上的方法。在这种情况下,应用程序可以选
+择读取较小的数据报,甚至长度为 0。负载相应地被截断。直到进程调用
+recvmsg() 到错误队列,然而,整个数据包仍在队列中,占用 SO_RCVBUF 预算。
+
+
+2.1.1.5 阻塞读取
+~~~~~~~~~~~~~~~~
+
+从错误队列读取始终是非阻塞操作。要阻塞等待时间戳,请使用 poll 或
+select。poll() 将在 pollfd.revents 中返回 POLLERR,如果错误队列
+中有数据。没有必要在 pollfd.events中传递此标志。此标志在请求时被忽
+略。另请参阅 `man 2 poll`。
+
+
+2.1.2 接收时间戳
+^^^^^^^^^^^^^^^^
+
+在接收时,没有理由从套接字错误队列读取。SCM_TIMESTAMPING 辅助数据与
+数据包数据一起通过正常 recvmsg() 发送。由于这不是套接字错误,它不伴
+随消息 SOL_IP(V6)/IP(V6)_RECVERROR。在这种情况下,struct
+scm_timestamping 中的三个字段含义隐式定义。ts[0] 在设置时包含软件
+时间戳,ts[1] 再次被弃用,ts[2] 在设置时包含硬件时间戳。
+
+
+3. 硬件时间戳配置:ETHTOOL_MSG_TSCONFIG_SET/GET
+===============================================
+
+硬件时间戳也必须为每个设备驱动程序初始化,该驱动程序预期执行硬件时间戳。
+参数在 include/uapi/linux/net_tstamp.h 中定义为::
+
+ struct hwtstamp_config {
+ int flags; /* 目前没有定义的标志,必须为零 */
+ int tx_type; /* HWTSTAMP_TX_* */
+ int rx_filter; /* HWTSTAMP_FILTER_* */
+ };
+
+期望的行为通过 tsconfig netlink 套接字 ``ETHTOOL_MSG_TSCONFIG_SET``
+传递到内核,并通过 ``ETHTOOL_A_TSCONFIG_TX_TYPES``、
+``ETHTOOL_A_TSCONFIG_RX_FILTERS`` 和 ``ETHTOOL_A_TSCONFIG_HWTSTAMP_FLAGS``
+netlink 属性设置 struct hwtstamp_config 相应地。
+
+``ETHTOOL_A_TSCONFIG_HWTSTAMP_PROVIDER`` netlink 嵌套属性用于选择
+硬件时间戳的来源。它由设备源的索引和时间戳类型限定符组成。
+
+驱动程序可以自由使用比请求更宽松的配置。预期驱动程序应仅实现可以直接支持的
+最通用模式。例如,如果硬件可以支持 HWTSTAMP_FILTER_PTP_V2_EVENT,则它
+通常应始终升级HWTSTAMP_FILTER_PTP_V2_L2_SYNC,依此类推,因为
+HWTSTAMP_FILTER_PTP_V2_EVENT 更通用(更实用)。
+
+支持硬件时间戳的驱动程序应更新 struct,并可能返回更宽松的实际配置。如果
+请求的数据包无法进行时间戳,则不应更改任何内容,并返回 ERANGE(与 EINVAL
+相反,这表明 SIOCSHWTSTAMP 根本不支持)。
+
+只有具有管理权限的进程才能更改配置。用户空间负责确保多个进程不会相互干扰,
+并确保设置被重置。
+
+任何进程都可以通过请求 tsconfig netlink 套接字 ``ETHTOOL_MSG_TSCONFIG_GET``
+读取实际配置。
+
+遗留配置是使用 ioctl(SIOCSHWTSTAMP) 与指向 struct ifreq 的指针,其
+ifr_data指向 struct hwtstamp_config。tx_type 和 rx_filter 是驱动
+程序期望执行的提示。如果请求的细粒度过滤对传入数据包不支持,驱动程序可能
+会对请求的数据包进行时间戳。ioctl(SIOCGHWTSTAMP) 以与
+ioctl(SIOCSHWTSTAMP) 相同的方式使用。然而,并非所有驱动程序都实现了这一点。
+
+::
+
+ /* 可能的 hwtstamp_config->tx_type 值 */
+ enum {
+ /*
+ * 不会需要硬件时间戳的传出数据包;
+ * 如果数据包到达并请求它,则不会进行硬件时间戳
+ */
+ HWTSTAMP_TX_OFF,
+
+ /*
+ * 启用传出数据包的硬件时间戳;
+ * 数据包的发送者决定哪些数据包需要时间戳,
+ * 在发送数据包之前设置 SOF_TIMESTAMPING_TX_SOFTWARE
+ */
+ HWTSTAMP_TX_ON,
+ };
+
+ /* 可能的 hwtstamp_config->rx_filter 值 */
+ enum {
+ /* 时间戳不传入任何数据包 */
+ HWTSTAMP_FILTER_NONE,
+
+ /* 时间戳任何传入数据包 */
+ HWTSTAMP_FILTER_ALL,
+
+ /* 返回值:时间戳所有请求的数据包加上一些其他数据包 */
+ HWTSTAMP_FILTER_SOME,
+
+ /* PTP v1,UDP,任何事件数据包 */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+
+ /* 有关完整值列表,请检查
+ * 文件 include/uapi/linux/net_tstamp.h
+ */
+ };
+
+3.1 硬件时间戳实现:设备驱动程序
+--------------------------------
+
+支持硬件时间戳的驱动程序必须支持 ndo_hwtstamp_set NDO 或遗留 SIOCSHWTSTAMP
+ioctl 并更新提供的 struct hwtstamp_config 与实际值,如 SIOCSHWTSTAMP 部分
+所述。它还应支持 ndo_hwtstamp_get 或遗留 SIOCGHWTSTAMP。
+
+接收数据包的时间戳必须存储在 skb 中。要获取 skb 的共享时间戳结构,请调用
+skb_hwtstamps()。然后设置结构中的时间戳::
+
+ struct skb_shared_hwtstamps {
+ /* 硬件时间戳转换为自任意时间点的持续时间
+ * 自定义点
+ */
+ ktime_t hwtstamp;
+ };
+
+传出数据包的时间戳应按如下方式生成:
+
+- 在 hard_start_xmit() 中,检查 (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)
+ 是否不为零。如果是,则驱动程序期望执行硬件时间戳。
+- 如果此 skb 和请求都可能,则声明驱动程序正在执行时间戳,通过设置 skb_shinfo(skb)->tx_flags
+ 中的标志SKBTX_IN_PROGRESS,例如::
+
+ skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+
+ 您可能希望保留与 skb 关联的指针,而不是释放 skb。不支持硬件时间戳的驱
+ 动程序不会这样做。驱动程序绝不能触及 sk_buff::tstamp!它用于存储网络
+ 子系统生成的软件时间戳。
+- 驱动程序应在尽可能接近将 sk_buff 传递给硬件时调用 skb_tx_timestamp()。
+ skb_tx_timestamp()提供软件时间戳(如果请求),并且硬件时间戳不可用
+ (SKBTX_IN_PROGRESS 未设置)。
+- 一旦驱动程序发送数据包并/或获取硬件时间戳,它就会通过 skb_tstamp_tx()
+ 传递时间戳,原始 skb,原始硬件时间戳。skb_tstamp_tx() 克隆原始 skb 并
+ 添加时间戳,因此原始 skb 现在必须释放。如果获取硬件时间戳失败,则驱动程序
+ 不应回退到软件时间戳。理由是,这会在处理管道中的稍后时间发生,而不是其他软
+ 件时间戳,因此可能导致时间戳之间的差异。
+
+3.2 堆叠 PTP 硬件时钟的特殊考虑
+-------------------------------
+
+在数据包的路径中可能存在多个 PHC(PTP 硬件时钟)。内核没有明确的机制允许用
+户选择用于时间戳以太网帧的 PHC。相反,假设最外层的 PHC 始终是最优的,并且
+内核驱动程序协作以实现这一目标。目前有 3 种堆叠 PHC 的情况,如下所示:
+
+3.2.1 DSA(分布式交换架构)交换机
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+这些是具有一个端口连接到(完全不知情的)主机以太网接口的以太网交换机,并且
+执行端口多路复用或可选转发加速功能。每个 DSA 交换机端口在用户看来都是独立的
+(虚拟)网络接口,其网络 I/O 在底层通过主机接口(在 TX 上重定向到主机端口,
+在 RX 上拦截帧)执行。
+
+当 DSA 交换机连接到主机端口时,PTP 同步必须受到限制,因为交换机的可变排队
+延迟引入了主机端口与其 PTP 伙伴之间的路径延迟抖动。因此,一些 DSA 交换机
+包含自己的时间戳时钟,并具有在自身 MAC上执行网络时间戳的能力,因此路径延迟
+仅测量线缆和 PHY 传播延迟。支持 Linux 的 DSA 交换机暴露了与任何其他网络
+接口相同的 ABI(除了 DSA 接口在网络 I/O 方面实际上是虚拟的,它们确实有自
+己的PHC)。典型地,但不是强制性地,所有DSA 交换机接口共享相同的 PHC。
+
+通过设计,DSA 交换机对连接到其主机端口的 PTP 时间戳不需要任何特殊的驱动程
+序处理。然而,当主机端口也支持 PTP 时间戳时,DSA 将负责拦截
+``.ndo_eth_ioctl`` 调用,并阻止尝试在主机端口上启用硬件时间戳。这是因为
+SO_TIMESTAMPING API 不允许为同一数据包传递多个硬件时间戳,因此除了 DSA
+交换机端口之外的任何人都不应阻止这样做。
+
+在通用层,DSA 提供了以下基础设施用于 PTP 时间戳:
+
+- ``.port_txtstamp()``:在用户空间从用户空间请求带有硬件 TX 时间戳请求
+ 的数据包之前调用的钩子。这是必需的,因为硬件时间戳在实际 MAC 传输后才可
+ 用,因此驱动程序必须准备将时间戳与原始数据包相关联,以便它可以重新入队数
+ 据包到套接字的错误队列。为了保存可能在时间戳可用时需要的数据包,驱动程序
+ 可以调用 ``skb_clone_sk``,在 skb->cb 中保存克隆指针,并入队一个 tx
+ skb 队列。通常,交换机会有一个PTP TX 时间戳寄存器(或有时是一个 FIFO),
+ 其中时间戳可用。在 FIFO 的情况下,硬件可能会存储PTP 序列 ID/消息类型/
+ 域号和实际时间戳的键值对。为了在等待时间戳的数据包队列和实际时间戳之间正
+ 确关联,驱动程序可以使用 BPF 分类器(``ptp_classify_raw``) 来识别 PTP
+ 传输类型,并使用 ``ptp_parse_header`` 解释 PTP 头字段。可能存在一个 IRQ,
+ 当此时间戳可用时触发,或者驱动程序可能需要轮询,在调用 ``dev_queue_xmit()``
+ 到主机接口之后。单步 TX 时间戳不需要数据包克隆,因为 PTP 协议不需要后续消
+ 息(因为TX 时间戳已嵌入到数据包中),因此用户空间不期望数据包带有 TX 时间戳
+ 被重新入队到其套接字的错误队列。
+
+- ``.port_rxtstamp()``:在 RX 上,DSA 运行 BPF 分类器以识别 PTP 事件消息
+ (任何其他数据包,包括 PTP 通用消息,不进行时间戳)。驱动程序提供原始(也是唯一)
+ 时间戳数据包,以便它可以标记它,如果它是立即可用的,或者延迟。在接收时,时间
+ 戳可能要么在频带内(通过DSA 头中的元数据,或以其他方式附加到数据包),要么在频
+ 带外(通过另一个 RX 时间戳FIFO)。在 RX 上延迟通常是必要的,当检索时间戳需要
+ 可睡眠上下文时。在这种情况下,DSA驱动程序有责任调用 ``netif_rx()`` 在新鲜时
+ 间戳的 skb 上。
+
+3.2.2 以太网 PHYs
+^^^^^^^^^^^^^^^^^
+
+这些是通常在网络栈中履行第 1 层角色的设备,因此它们在 DSA 交换机中没有网络接
+口的表示。然而,PHY可能能够检测和时间戳 PTP 数据包,出于性能原因:在尽可能接
+近导线的地方获取的时间戳具有更稳定的同步性和更精确的精度。
+
+支持 PTP 时间戳的 PHY 驱动程序必须创建 ``struct mii_timestamper`` 并添加
+指向它的指针在 ``phydev->mii_ts`` 中。 ``phydev->mii_ts`` 的存在将由网络
+堆栈检查。
+
+由于 PHY 没有网络接口表示,PHY 的时间戳和 ethtool ioctl 操作需要通过其各自
+的 MAC驱动程序进行中介。因此,与 DSA 交换机不同,需要对每个单独的 MAC 驱动
+程序进行 PHY时间戳支持的修改。这包括:
+
+- 在 ``.ndo_eth_ioctl`` 中检查,是否 ``phy_has_hwtstamp(netdev->phydev)``
+ 为真或假。如果是,则 MAC 驱动程序不应处理此请求,而应将其传递给 PHY 使用
+ ``phy_mii_ioctl()``。
+
+- 在 RX 上,特殊干预可能或可能不需要,具体取决于将 skb 传递到网络堆栈的函数。
+ 在 plain ``netif_rx()`` 和类似情况下,MAC 驱动程序必须检查是否
+ ``skb_defer_rx_timestamp(skb)`` 是必要的,如果是,则不调用 ``netif_rx()``。
+ 如果 ``CONFIG_NETWORK_PHY_TIMESTAMPING`` 启用,并且
+ ``skb->dev->phydev->mii_ts`` 存在,它的 ``.rxtstamp()`` 钩子现在将被调
+ 用,以使用与 DSA 类似的逻辑确定 RX 时间戳延迟是否必要。同样像 DSA,它成为
+ PHY 驱动程序的责任,在时间戳可用时发送数据包到堆栈。
+
+ 对于其他 skb 接收函数,例如 ``napi_gro_receive`` 和 ``netif_receive_skb``,
+ 堆栈会自动检查是否 ``skb_defer_rx_timestamp()`` 是必要的,因此此检查不
+ 需要在驱动程序内部。
+
+- 在 TX 上,同样,特殊干预可能或可能不需要。调用 ``mii_ts->txtstamp()``钩
+ 子的函数名为``skb_clone_tx_timestamp()``。此函数可以直接调用(在这种情
+ 况下,确实需要显式 MAC 驱动程序支持),但函数也 piggybacks 从
+ ``skb_tx_timestamp()`` 调用,许多 MAC 驱动程序已经为软件时间戳目的执行。
+ 因此,如果 MAC 支持软件时间戳,则它不需要在此阶段执行任何其他操作。
+
+3.2.3 MII 总线嗅探设备
+^^^^^^^^^^^^^^^^^^^^^^
+
+这些执行与时间戳以太网 PHY 相同的角色,除了它们是离散设备,因此可以与任何 PHY
+组合,即使它不支持时间戳。在 Linux 中,它们是可发现的,可以通过 Device Tree
+附加到 ``struct phy_device``,对于其余部分,它们使用与那些相同的 mii_ts 基
+础设施。请参阅 Documentation/devicetree/bindings/ptp/timestamper.txt 了
+解更多详细信息。
+
+3.2.4 MAC 驱动程序的其他注意事项
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+堆叠 PHC 可能会暴露 MAC 驱动程序的错误,这些错误在未堆叠 PHC 时无法触发。一个
+例子涉及此行代码,已经在前面的部分中介绍过::
+
+ skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+
+任何 TX 时间戳逻辑,无论是普通的 MAC 驱动程序、DSA 交换机驱动程序、PHY 驱动程
+序还是 MII 总线嗅探设备驱动程序,都应该设置此标志。但一个未意识到 PHC 堆叠的
+MAC 驱动程序可能会被其他不是它自己的实体设置此标志,并传递一个重复的时间戳。例
+如,典型的 TX 时间戳逻辑可能是将传输部分分为 2 个部分:
+
+1. "TX":检查是否通过 ``.ndo_eth_ioctl``("``priv->hwtstamp_tx_enabled
+ == true``")和当前 skb 是否需要 TX 时间戳("``skb_shinfo(skb)->tx_flags
+ & SKBTX_HW_TSTAMP``")。如果为真,则设置 "``skb_shinfo(skb)->tx_flags
+ |= SKBTX_IN_PROGRESS``" 标志。注意:如上所述,在堆叠 PHC 系统中,此条件
+ 不应触发,因为此 MAC 肯定不是最外层的 PHC。但这是典型的错误所在。传输继续
+ 使用此数据包。
+
+2. "TX 确认":传输完成。驱动程序检查是否需要收集任何 TX 时间戳。这里通常是典
+ 型的错误所在:驱动程序采取捷径,只检查 "``skb_shinfo(skb)->tx_flags &
+ SKBTX_IN_PROGRESS``" 是否设置。在堆叠 PHC 系统中,这是错误的,因为此 MAC
+ 驱动程序不是唯一在 TX 数据路径中启用 SKBTX_IN_PROGRESS 的实体。
+
+此问题的正确解决方案是 MAC 驱动程序在其 "TX 确认" 部分中有一个复合检查,不仅
+针对 "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``",还针对
+"``priv->hwtstamp_tx_enabled == true``"。因为系统确保 PTP 时间戳仅对最
+外层 PHC 启用,此增强检查将避免向用户空间传递重复的 TX 时间戳。
diff --git a/Documentation/translations/zh_CN/rust/general-information.rst b/Documentation/translations/zh_CN/rust/general-information.rst
index 251f6ee2bb44..9b5e37e13f38 100644
--- a/Documentation/translations/zh_CN/rust/general-information.rst
+++ b/Documentation/translations/zh_CN/rust/general-information.rst
@@ -13,6 +13,7 @@
本文档包含了在内核中使用Rust支持时需要了解的有用信息。
+.. _rust_code_documentation_zh_cn:
代码文档
--------
diff --git a/Documentation/translations/zh_CN/rust/index.rst b/Documentation/translations/zh_CN/rust/index.rst
index b01f887e7167..5347d4729588 100644
--- a/Documentation/translations/zh_CN/rust/index.rst
+++ b/Documentation/translations/zh_CN/rust/index.rst
@@ -10,7 +10,35 @@
Rust
====
-与内核中的Rust有关的文档。若要开始在内核中使用Rust,请阅读quick-start.rst指南。
+与内核中的Rust有关的文档。若要开始在内核中使用Rust,请阅读 quick-start.rst 指南。
+
+Rust 实验
+---------
+Rust 支持在 v6.1 版本中合并到主线,以帮助确定 Rust 作为一种语言是否适合内核,
+即是否值得进行权衡。
+
+目前,Rust 支持主要面向对 Rust 支持感兴趣的内核开发人员和维护者,
+以便他们可以开始处理抽象和驱动程序,并帮助开发基础设施和工具。
+
+如果您是终端用户,请注意,目前没有适合或旨在生产使用的内置驱动程序或模块,
+并且 Rust 支持仍处于开发/实验阶段,尤其是对于特定内核配置。
+
+代码文档
+--------
+
+给定一个内核配置,内核可能会生成 Rust 代码文档,即由 ``rustdoc`` 工具呈现的 HTML。
+
+.. only:: rustdoc and html
+
+ 该内核文档使用 `Rust 代码文档 <rustdoc/kernel/index.html>`_ 构建。
+
+.. only:: not rustdoc and html
+
+ 该内核文档不使用 Rust 代码文档构建。
+
+预生成版本提供在:https://rust.docs.kernel.org。
+
+请参阅 :ref:`代码文档 <rust_code_documentation_zh_cn>` 部分以获取更多详细信息。
.. toctree::
:maxdepth: 1
@@ -19,6 +47,9 @@ Rust
general-information
coding-guidelines
arch-support
+ testing
+
+你还可以在 :doc:`../../../process/kernel-docs` 中找到 Rust 的学习材料。
.. only:: subproject and html
diff --git a/Documentation/translations/zh_CN/rust/testing.rst b/Documentation/translations/zh_CN/rust/testing.rst
new file mode 100644
index 000000000000..ca81f1cef6eb
--- /dev/null
+++ b/Documentation/translations/zh_CN/rust/testing.rst
@@ -0,0 +1,215 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/rust/testing.rst
+
+:翻译:
+
+ 郭杰 Ben Guo <benx.guo@gmail.com>
+
+测试
+====
+
+本文介绍了如何在内核中测试 Rust 代码。
+
+有三种测试类型:
+
+- KUnit 测试
+- ``#[test]`` 测试
+- Kselftests
+
+KUnit 测试
+----------
+
+这些测试来自 Rust 文档中的示例。它们会被转换为 KUnit 测试。
+
+使用
+****
+
+这些测试可以通过 KUnit 运行。例如,在命令行中使用 ``kunit_tool`` ( ``kunit.py`` )::
+
+ ./tools/testing/kunit/kunit.py run --make_options LLVM=1 --arch x86_64 --kconfig_add CONFIG_RUST=y
+
+或者,KUnit 也可以在内核启动时以内置方式运行。获取更多 KUnit 信息,请参阅
+Documentation/dev-tools/kunit/index.rst。
+关于内核内置与命令行测试的详细信息,请参阅 Documentation/dev-tools/kunit/architecture.rst。
+
+要使用这些 KUnit 文档测试,需要在内核配置中启用以下选项::
+
+ CONFIG_KUNIT
+ Kernel hacking -> Kernel Testing and Coverage -> KUnit - Enable support for unit tests
+ CONFIG_RUST_KERNEL_DOCTESTS
+ Kernel hacking -> Rust hacking -> Doctests for the `kernel` crate
+
+KUnit 测试即文档测试
+********************
+
+文档测试( *doctests* )一般用于展示函数、结构体或模块等的使用方法。
+
+它们非常方便,因为它们就写在文档旁边。例如:
+
+.. code-block:: rust
+
+ /// 求和两个数字。
+ ///
+ /// ```
+ /// assert_eq!(mymod::f(10, 20), 30);
+ /// ```
+ pub fn f(a: i32, b: i32) -> i32 {
+ a + b
+ }
+
+在用户空间中,这些测试由 ``rustdoc`` 负责收集并运行。单独使用这个工具已经很有价值,
+因为它可以验证示例能否成功编译(确保和代码保持同步),
+同时还可以运行那些不依赖内核 API 的示例。
+
+然而,在内核中,这些测试会转换成 KUnit 测试套件。
+这意味着文档测试会被编译成 Rust 内核对象,从而可以在构建的内核环境中运行。
+
+通过与 KUnit 集成,Rust 的文档测试可以复用内核现有的测试设施。
+例如,内核日志会显示::
+
+ KTAP version 1
+ 1..1
+ KTAP version 1
+ # Subtest: rust_doctests_kernel
+ 1..59
+ # rust_doctest_kernel_build_assert_rs_0.location: rust/kernel/build_assert.rs:13
+ ok 1 rust_doctest_kernel_build_assert_rs_0
+ # rust_doctest_kernel_build_assert_rs_1.location: rust/kernel/build_assert.rs:56
+ ok 2 rust_doctest_kernel_build_assert_rs_1
+ # rust_doctest_kernel_init_rs_0.location: rust/kernel/init.rs:122
+ ok 3 rust_doctest_kernel_init_rs_0
+ ...
+ # rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150
+ ok 59 rust_doctest_kernel_types_rs_2
+ # rust_doctests_kernel: pass:59 fail:0 skip:0 total:59
+ # Totals: pass:59 fail:0 skip:0 total:59
+ ok 1 rust_doctests_kernel
+
+文档测试中,也可以正常使用 `? <https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator>`_ 运算符,例如:
+
+.. code-block:: rust
+
+ /// ```
+ /// # use kernel::{spawn_work_item, workqueue};
+ /// spawn_work_item!(workqueue::system(), || pr_info!("x\n"))?;
+ /// # Ok::<(), Error>(())
+ /// ```
+
+这些测试和普通代码一样,也可以在 ``CLIPPY=1`` 条件下通过 Clippy 进行编译,
+因此可以从额外的 lint 检查中获益。
+
+为了便于开发者定位文档测试出错的具体行号,日志会输出一条 KTAP 诊断信息。
+其中标明了原始测试的文件和行号(不是 ``rustdoc`` 生成的临时 Rust 文件位置)::
+
+ # rust_doctest_kernel_types_rs_2.location: rust/kernel/types.rs:150
+
+Rust 测试中常用的断言宏是来自 Rust 标准库( ``core`` )中的 ``assert!`` 和 ``assert_eq!`` 宏。
+内核提供了一个定制版本,这些宏的调用会被转发到 KUnit。
+和 KUnit 测试不同的是,这些宏不需要传递上下文参数( ``struct kunit *`` )。
+这使得它们更易于使用,同时文档的读者无需关心底层用的是什么测试框架。
+此外,这种方式未来也许可以让我们更容易测试第三方代码。
+
+当前有一个限制:KUnit 不支持在其他任务中执行断言。
+因此,如果断言真的失败了,我们只是简单地把错误打印到内核日志里。
+另外,文档测试不适用于非公开的函数。
+
+作为文档中的测试示例,应当像 “实际代码” 一样编写。
+例如:不要使用 ``unwrap()`` 或 ``expect()``,请使用 `? <https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator>`_ 运算符。
+更多背景信息,请参阅:
+
+ https://rust.docs.kernel.org/kernel/error/type.Result.html#error-codes-in-c-and-rust
+
+``#[test]`` 测试
+----------------
+
+此外,还有 ``#[test]`` 测试。与文档测试类似,这些测试与用户空间中的测试方式也非常相近,并且同样会映射到 KUnit。
+
+这些测试通过 ``kunit_tests`` 过程宏引入,该宏将测试套件的名称作为参数。
+
+例如,假设想要测试前面文档测试示例中的函数 ``f``,我们可以在定义该函数的同一文件中编写:
+
+.. code-block:: rust
+
+ #[kunit_tests(rust_kernel_mymod)]
+ mod tests {
+ use super::*;
+
+ #[test]
+ fn test_f() {
+ assert_eq!(f(10, 20), 30);
+ }
+ }
+
+如果我们执行这段代码,内核日志会显示::
+
+ KTAP version 1
+ # Subtest: rust_kernel_mymod
+ # speed: normal
+ 1..1
+ # test_f.speed: normal
+ ok 1 test_f
+ ok 1 rust_kernel_mymod
+
+与文档测试类似, ``assert!`` 和 ``assert_eq!`` 宏被映射回 KUnit 并且不会发生 panic。
+同样,支持 `? <https://doc.rust-lang.org/reference/expressions/operator-expr.html#the-question-mark-operator>`_ 运算符,
+测试函数可以什么都不返回(单元类型 ``()``)或 ``Result`` (任何 ``Result<T, E>``)。例如:
+
+.. code-block:: rust
+
+ #[kunit_tests(rust_kernel_mymod)]
+ mod tests {
+ use super::*;
+
+ #[test]
+ fn test_g() -> Result {
+ let x = g()?;
+ assert_eq!(x, 30);
+ Ok(())
+ }
+ }
+
+如果我们运行测试并且调用 ``g`` 失败,那么内核日志会显示::
+
+ KTAP version 1
+ # Subtest: rust_kernel_mymod
+ # speed: normal
+ 1..1
+ # test_g: ASSERTION FAILED at rust/kernel/lib.rs:335
+ Expected is_test_result_ok(test_g()) to be true, but is false
+ # test_g.speed: normal
+ not ok 1 test_g
+ not ok 1 rust_kernel_mymod
+
+如果 ``#[test]`` 测试可以对用户起到示例作用,那就应该改用文档测试。
+即使是 API 的边界情况,例如错误或边界问题,放在示例中展示也同样有价值。
+
+``rusttest`` 宿主机测试
+-----------------------
+
+这类测试运行在用户空间,可以通过 ``rusttest`` 目标在构建内核的宿主机中编译并运行::
+
+ make LLVM=1 rusttest
+
+当前操作需要内核 ``.config``。
+
+目前,它们主要用于测试 ``macros`` crate 的示例。
+
+Kselftests
+----------
+
+Kselftests 可以在 ``tools/testing/selftests/rust`` 文件夹中找到。
+
+测试所需的内核配置选项列在 ``tools/testing/selftests/rust/config`` 文件中,
+可以借助 ``merge_config.sh`` 脚本合并到现有配置中::
+
+ ./scripts/kconfig/merge_config.sh .config tools/testing/selftests/rust/config
+
+Kselftests 会在内核源码树中构建,以便在运行相同版本内核的系统上执行测试。
+
+一旦安装并启动了与源码树匹配的内核,测试即可通过以下命令编译并执行::
+
+ make TARGETS="rust" kselftest
+
+请参阅 Documentation/dev-tools/kselftest.rst 文档以获取更多信息。
diff --git a/Documentation/translations/zh_CN/scsi/index.rst b/Documentation/translations/zh_CN/scsi/index.rst
new file mode 100644
index 000000000000..5f1803e2706c
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/index.rst
@@ -0,0 +1,92 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/index.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+==========
+SCSI子系统
+==========
+
+.. toctree::
+ :maxdepth: 1
+
+简介
+====
+
+.. toctree::
+ :maxdepth: 1
+
+ scsi
+
+SCSI驱动接口
+============
+
+.. toctree::
+ :maxdepth: 1
+
+ scsi_mid_low_api
+ scsi_eh
+
+SCSI驱动参数
+============
+
+.. toctree::
+ :maxdepth: 1
+
+ scsi-parameters
+ link_power_management_policy
+
+SCSI主机适配器驱动
+==================
+
+.. toctree::
+ :maxdepth: 1
+
+ libsas
+ sd-parameters
+ wd719x
+
+Todolist:
+
+* 53c700
+* aacraid
+* advansys
+* aha152x
+* aic79xx
+* aic7xxx
+* arcmsr_spec
+* bfa
+* bnx2fc
+* BusLogic
+* cxgb3i
+* dc395x
+* dpti
+* FlashPoint
+* g_NCR5380
+* hpsa
+* hptiop
+* lpfc
+* megaraid
+* ncr53c8xx
+* NinjaSCSI
+* ppa
+* qlogicfas
+* scsi-changer
+* scsi_fc_transport
+* scsi-generic
+* smartpqi
+* st
+* sym53c500_cs
+* sym53c8xx_2
+* tcm_qla2xxx
+* ufs
+
+* scsi_transport_srp/figures
diff --git a/Documentation/translations/zh_CN/scsi/libsas.rst b/Documentation/translations/zh_CN/scsi/libsas.rst
new file mode 100644
index 000000000000..15fa71cdd821
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/libsas.rst
@@ -0,0 +1,425 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/libsas.rst
+
+:翻译:
+
+ 张钰杰 Yujie Zhang <yjzhang@leap-io-kernel.com>
+
+:校译:
+
+======
+SAS 层
+======
+
+SAS 层是一个管理基础架构,用于管理 SAS LLDD。它位于 SCSI Core
+与 SAS LLDD 之间。 体系结构如下: SCSI Core 关注的是 SAM/SPC 相
+关的问题;SAS LLDD 及其序列控制器负责 PHY 层、OOB 信号以及链路
+管理;而 SAS 层则负责以下任务::
+
+ * SAS Phy、Port 和主机适配器(HA)事件管理(事件由 LLDD
+ 生成,由 SAS 层处理);
+ * SAS 端口的管理(创建与销毁);
+ * SAS 域的发现与重新验证;
+ * SAS 域内设备的管理;
+ * SCSI 主机的注册与注销;
+ * 将设备注册到 SCSI Core(SAS 设备)或 libata(SATA 设备);
+ * 扩展器的管理,并向用户空间导出扩展器控制接口。
+
+SAS LLDD 是一种 PCI 设备驱动程序。它负责 PHY 层和 OOB(带外)
+信号的管理、厂商特定的任务,并向 SAS 层上报事件。
+
+SAS 层实现了 SAS 1.1 规范中定义的大部分 SAS 功能。
+
+sas_ha_struct 结构体用于向 SAS 层描述一个 SAS LLDD。该结构的
+大部分字段由 SAS 层使用,但其中少数字段需要由 LLDD 进行初始化。
+
+在完成硬件初始化之后,应当在驱动的 probe() 函数中调用
+sas_register_ha()。该函数会将 LLDD 注册到 SCSI 子系统中,创
+建一个对应的 SCSI 主机,并将你的 SAS 驱动程序注册到其在 sysfs
+下创建的 SAS 设备树中。随后该函数将返回。接着,你需要使能 PHY,
+以启动实际的 OOB(带外)过程;此时驱动将开始调用 notify_* 系
+列事件回调函数。
+
+结构体说明
+==========
+
+``struct sas_phy``
+------------------
+
+通常情况下,该结构体会被静态地嵌入到驱动自身定义的 PHY 结构体中,
+例如::
+
+ struct my_phy {
+ blah;
+ struct sas_phy sas_phy;
+ bleh;
+ }
+
+随后,在主机适配器(HA)的结构体中,所有的 PHY 通常以 my_phy
+数组的形式存在(如下文所示)。
+
+在初始化各个 PHY 时,除了初始化驱动自定义的 PHY 结构体外,还
+需要同时初始化其中的 sas_phy 结构体。
+
+一般来说,PHY 的管理由 LLDD 负责,而端口(port)的管理由 SAS
+层负责。因此,PHY 的初始化与更新由 LLDD 完成,而端口的初始化与
+更新则由 SAS 层完成。系统设计中规定,某些字段可由 LLDD 进行读
+写,而 SAS 层只能读取这些字段;反之亦然。其设计目的是为了避免不
+必要的锁操作。
+
+在该设计中,某些字段可由 LLDD 进行读写(RW),而 SAS 层仅可读
+取这些字段;反之亦然。这样设计的目的在于避免不必要的锁操作。
+
+enabled
+ - 必须设置(0/1)
+
+id
+ - 必须设置[0,MAX_PHYS)]
+
+class, proto, type, role, oob_mode, linkrate
+ - 必须设置。
+
+oob_mode
+ - 当 OOB(带外信号)完成后,设置此字段,然后通知 SAS 层。
+
+sas_addr
+ - 通常指向一个保存该 PHY 的 SAS 地址的数组,该数组可能位于
+ 驱动自定义的 my_phy 结构体中。
+
+attached_sas_addr
+ - 当 LLDD 接收到 IDENTIFY 帧或 FIS 帧时,应在通知 SAS 层
+ 之前设置该字段。其设计意图在于:有时 LLDD 可能需要伪造或
+ 提供一个与实际不同的 SAS 地址用于该 PHY/端口,而该机制允许
+ LLDD 这样做。理想情况下,应将 SAS 地址从 IDENTIFY 帧中
+ 复制过来;对于直接连接的 SATA 设备,也可以由 LLDD 生成一
+ 个 SAS 地址。后续的发现过程可能会修改此字段。
+
+frame_rcvd
+ - 当接收到 IDENTIFY 或 FIS 帧时,将该帧复制到此处。正确的
+ 操作流程是获取锁 → 复制数据 → 设置 frame_rcvd_size → 释
+ 放锁 → 调用事件通知。该字段是一个指针,因为驱动无法精确确
+ 定硬件帧的大小;因此,实际的帧数据数组应定义在驱动自定义的
+ PHY 结构体中,然后让此指针指向该数组。在持锁状态下,将帧从
+ DMA 可访问内存区域复制到该数组中。
+
+sas_prim
+ - 用于存放接收到的原语(primitive)。参见 sas.h。操作流程同
+ 样是:获取锁 → 设置 primitive → 释放锁 → 通知事件。
+
+port
+ - 如果该 PHY 属于某个端口(port),此字段指向对应的 sas_port
+ 结构体。LLDD 仅可读取此字段。它由 SAS 层设置,用于指向当前
+ PHY 所属的 sas_port。
+
+ha
+ - 可以由 LLDD 设置;但无论是否设置,SAS 层都会再次对其进行赋值。
+
+lldd_phy
+ - LLDD 应将此字段设置为指向自身定义的 PHY 结构体,这样当 SAS
+ 层调用某个回调并传入 sas_phy 时,驱动可以快速定位自身的 PHY
+ 结构体。如果 sas_phy 是嵌入式成员,也可以使用 container_of()
+ 宏进行访问——两种方式均可。
+
+``struct sas_port``
+-------------------
+
+LLDD 不应修改该结构体中的任何字段——它只能读取这些字段。这些字段的
+含义应当是不言自明的。
+
+phy_mask 为 32 位,目前这一长度已足够使用,因为尚未听说有主机适配
+器拥有超过8 个 PHY。
+
+lldd_port
+ - 目前尚无明确用途。不过,对于那些希望在 LLDD 内部维护自身端
+ 口表示的驱动,实现时可以利用该字段。
+
+``struct sas_ha_struct``
+------------------------
+
+它通常静态声明在你自己的 LLDD 结构中,用于描述您的适配器::
+
+ struct my_sas_ha {
+ blah;
+ struct sas_ha_struct sas_ha;
+ struct my_phy phys[MAX_PHYS];
+ struct sas_port sas_ports[MAX_PHYS]; /* (1) */
+ bleh;
+ };
+
+ (1) 如果你的 LLDD 没有自己的端口表示
+
+需要初始化(示例函数如下所示)。
+
+pcidev
+^^^^^^
+
+sas_addr
+ - 由于 SAS 层不想弄乱内存分配等, 因此这指向静态分配的数
+ 组中的某个位置(例如,在您的主机适配器结构中),并保存您或
+ 制造商等给出的主机适配器的 SAS 地址。
+
+sas_port
+^^^^^^^^
+
+sas_phy
+ - 指向结构体的指针数组(参见上文关于 sas_addr 的说明)。
+ 这些指针必须设置。更多细节见下文说明。
+
+num_phys
+ - 表示 sas_phy 数组中 PHY 的数量,同时也表示 sas_port
+ 数组中的端口数量。一个端口最多对应一个 PHY,因此最大端口数
+ 等于 num_phys。因此,结构中不再单独使用 num_ports 字段,
+ 而仅使用 num_phys。
+
+事件接口::
+
+ /* LLDD 调用以下函数来通知 SAS 类层发生事件 */
+ void sas_notify_port_event(struct sas_phy *, enum port_event, gfp_t);
+ void sas_notify_phy_event(struct sas_phy *, enum phy_event, gfp_t);
+
+端口事件通知::
+
+ /* SAS 类层调用以下回调来通知 LLDD 端口事件 */
+ void (*lldd_port_formed)(struct sas_phy *);
+ void (*lldd_port_deformed)(struct sas_phy *);
+
+如果 LLDD 希望在端口形成或解散时接收通知,则应将上述回调指针设
+置为符合函数类型定义的处理函数。
+
+SAS LLDD 还应至少实现 SCSI 协议中定义的一种任务管理函数(TMFs)::
+
+ /* 任务管理函数. 必须在进程上下文中调用 */
+ int (*lldd_abort_task)(struct sas_task *);
+ int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
+ int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
+ int (*lldd_I_T_nexus_reset)(struct domain_device *);
+ int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
+ int (*lldd_query_task)(struct sas_task *);
+
+如需更多信息,请参考 T10.org。
+
+端口与适配器管理::
+
+ /* 端口与适配器管理 */
+ int (*lldd_clear_nexus_port)(struct sas_port *);
+ int (*lldd_clear_nexus_ha)(struct sas_ha_struct *);
+
+SAS LLDD 至少应实现上述函数中的一个。
+
+PHY 管理::
+
+ /* PHY 管理 */
+ int (*lldd_control_phy)(struct sas_phy *, enum phy_func);
+
+lldd_ha
+ - 应设置为指向驱动的主机适配器(HA)结构体的指针。如果 sas_ha_struct
+ 被嵌入到更大的结构体中,也可以通过 container_of() 宏来获取。
+
+一个示例的初始化与注册函数可以如下所示:(该函数应在 probe()
+函数的最后调用)但必须在使能 PHY 执行 OOB 之前调用::
+
+ static int register_sas_ha(struct my_sas_ha *my_ha)
+ {
+ int i;
+ static struct sas_phy *sas_phys[MAX_PHYS];
+ static struct sas_port *sas_ports[MAX_PHYS];
+
+ my_ha->sas_ha.sas_addr = &my_ha->sas_addr[0];
+
+ for (i = 0; i < MAX_PHYS; i++) {
+ sas_phys[i] = &my_ha->phys[i].sas_phy;
+ sas_ports[i] = &my_ha->sas_ports[i];
+ }
+
+ my_ha->sas_ha.sas_phy = sas_phys;
+ my_ha->sas_ha.sas_port = sas_ports;
+ my_ha->sas_ha.num_phys = MAX_PHYS;
+
+ my_ha->sas_ha.lldd_port_formed = my_port_formed;
+
+ my_ha->sas_ha.lldd_dev_found = my_dev_found;
+ my_ha->sas_ha.lldd_dev_gone = my_dev_gone;
+
+ my_ha->sas_ha.lldd_execute_task = my_execute_task;
+
+ my_ha->sas_ha.lldd_abort_task = my_abort_task;
+ my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set;
+ my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set;
+ my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2)
+ my_ha->sas_ha.lldd_lu_reset = my_lu_reset;
+ my_ha->sas_ha.lldd_query_task = my_query_task;
+
+ my_ha->sas_ha.lldd_clear_nexus_port = my_clear_nexus_port;
+ my_ha->sas_ha.lldd_clear_nexus_ha = my_clear_nexus_ha;
+
+ my_ha->sas_ha.lldd_control_phy = my_control_phy;
+
+ return sas_register_ha(&my_ha->sas_ha);
+ }
+
+(2) SAS 1.1 未定义 I_T Nexus Reset TMF(任务管理功能)。
+
+事件
+====
+
+事件是 SAS LLDD 唯一的通知 SAS 层发生任何情况的方式。
+LLDD 没有其他方法可以告知 SAS 层其内部或 SAS 域中发生的事件。
+
+Phy 事件::
+
+ PHYE_LOSS_OF_SIGNAL, (C)
+ PHYE_OOB_DONE,
+ PHYE_OOB_ERROR, (C)
+ PHYE_SPINUP_HOLD.
+
+端口事件,通过 _phy_ 传递::
+
+ PORTE_BYTES_DMAED, (M)
+ PORTE_BROADCAST_RCVD, (E)
+ PORTE_LINK_RESET_ERR, (C)
+ PORTE_TIMER_EVENT, (C)
+ PORTE_HARD_RESET.
+
+主机适配器事件:
+ HAE_RESET
+
+SAS LLDD 应能够生成以下事件::
+
+ - 来自 C 组的至少一个事件(可选),
+ - 标记为 M(必需)的事件为必需事件(至少一种);
+ - 若希望 SAS 层处理域重新验证(domain revalidation),则
+ 应生成标记为 E(扩展器)的事件(仅需一种);
+ - 未标记的事件为可选事件。
+
+含义
+
+HAE_RESET
+ - 当 HA 发生内部错误并被复位时。
+
+PORTE_BYTES_DMAED
+ - 在接收到 IDENTIFY/FIS 帧时。
+
+PORTE_BROADCAST_RCVD
+ - 在接收到一个原语时。
+
+PORTE_LINK_RESET_ERR
+ - 定时器超时、信号丢失、丢失 DWS 等情况。 [1]_
+
+PORTE_TIMER_EVENT
+ - DWS 复位超时定时器到期时。[1]_
+
+PORTE_HARD_RESET
+ - 收到 Hard Reset 原语。
+
+PHYE_LOSS_OF_SIGNAL
+ - 设备已断开连接。 [1]_
+
+PHYE_OOB_DONE
+ - OOB 过程成功完成,oob_mode 有效。
+
+PHYE_OOB_ERROR
+ - 执行 OOB 过程中出现错误,设备可能已断开。 [1]_
+
+PHYE_SPINUP_HOLD
+ - 检测到 SATA 设备,但未发送 COMWAKE 信号。
+
+.. [1] 应设置或清除 phy 中相应的字段,或者从 tasklet 中调用
+ 内联函数 sas_phy_disconnected(),该函数只是一个辅助函数。
+
+执行命令 SCSI RPC::
+
+ int (*lldd_execute_task)(struct sas_task *, gfp_t gfp_flags);
+
+用于将任务排队提交给 SAS LLDD,@task 为要执行的任务,@gfp_mask
+为定义调用者上下文的 gfp 掩码。
+
+此函数应实现 执行 SCSI RPC 命令。
+
+也就是说,当调用 lldd_execute_task() 时,命令应当立即在传输
+层发出。SAS LLDD 中在任何层级上都不应再进行队列排放。
+
+返回值::
+
+ * 返回 -SAS_QUEUE_FULL 或 -ENOMEM 表示未排入队列;
+ * 返回 0 表示任务已成功排入队列。
+
+::
+
+ struct sas_task {
+ dev —— 此任务目标设备;
+ task_proto —— 协议类型,为 enum sas_proto 中的一种;
+ scatter —— 指向散布/聚集(SG)列表数组的指针;
+ num_scatter —— SG 列表元素数量;
+ total_xfer_len —— 预计传输的总字节数;
+ data_dir —— 数据传输方向(PCI_DMA_*);
+ task_done —— 任务执行完成时的回调函数。
+ };
+
+发现
+====
+
+sysfs 树有以下用途::
+
+ a) 它显示当前时刻 SAS 域的物理布局,即展示当前物理世界中
+ 域的实际结构。
+ b) 显示某些设备的参数。 _at_discovery_time_.
+
+下面是一个指向 tree(1) 程序的链接,该工具在查看 SAS 域时非常
+有用:
+ftp://mama.indstate.edu/linux/tree/
+
+我期望用户空间的应用程序最终能够为此创建一个图形界面。
+
+也就是说,sysfs 域树不会显示或保存某些状态变化,例如,如果你更
+改了 READY LED 含义的设置,sysfs 树不会反映这种状态变化;但它
+确实会显示域设备的当前连接状态。
+
+维护内部设备状态变化的职责由上层(命令集驱动)和用户空间负责。
+
+当某个设备或多个设备从域中拔出时,这一变化会立即反映在 sysfs
+树中,并且这些设备会从系统中移除。
+
+结构体 domain_device 描述了 SAS 域中的任意设备。它完全由 SAS
+层管理。一个任务会指向某个域设备,SAS LLDD 就是通过这种方式知
+道任务应发送到何处。SAS LLDD 只读取 domain_device 结构的内容,
+但不会创建或销毁它。
+
+用户空间中的扩展器管理
+======================
+
+在 sysfs 中的每个扩展器目录下,都有一个名为 "smp_portal" 的
+文件。这是一个二进制的 sysfs 属性文件,它实现了一个 SMP 入口
+(注意:这并不是一个 SMP 端口),用户空间程序可以通过它发送
+SMP 请求并接收 SMP 响应。
+
+该功能的实现方式看起来非常简单:
+
+1. 构建要发送的 SMP 帧。其格式和布局在 SAS 规范中有说明。保持
+ CRC 字段为 0。
+
+open(2)
+
+2. 以读写模式打开该扩展器的 SMP portal sysfs 文件。
+
+write(2)
+
+3. 将第 1 步中构建的帧写入文件。
+
+read(2)
+
+4. 读取与所构建帧预期返回长度相同的数据量。如果读取的数据量与
+ 预期不符,则表示发生了某种错误。
+
+close(2)
+
+整个过程在 "expander_conf.c" 文件中的函数 do_smp_func()
+及其调用者中有详细展示。
+
+对应的内核实现位于 "sas_expander.c" 文件中。
+
+程序 "expander_conf.c" 实现了上述逻辑。它接收一个参数——扩展器
+SMP portal 的 sysfs 文件名,并输出扩展器的信息,包括路由表内容。
+
+SMP portal 赋予了你对扩展器的完全控制权,因此请谨慎操作。
diff --git a/Documentation/translations/zh_CN/scsi/link_power_management_policy.rst b/Documentation/translations/zh_CN/scsi/link_power_management_policy.rst
new file mode 100644
index 000000000000..f2ab8fdf4aa8
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/link_power_management_policy.rst
@@ -0,0 +1,32 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/link_power_management_policy.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+================
+链路电源管理策略
+================
+
+该参数允许用户设置链路(接口)的电源管理模式。
+共计三类可选项:
+
+===================== =====================================================
+选项 作用
+===================== =====================================================
+min_power 指示控制器在可能的情况下尽量使链路处于最低功耗。
+ 这可能会牺牲一定的性能,因为从低功耗状态恢复时会增加延迟。
+
+max_performance 通常,这意味着不进行电源管理。指示
+ 控制器优先考虑性能而非电源管理。
+
+medium_power 指示控制器在可能的情况下进入较低功耗状态,
+ 而非最低功耗状态,从而改善min_power模式下的延迟。
+===================== =====================================================
diff --git a/Documentation/translations/zh_CN/scsi/scsi-parameters.rst b/Documentation/translations/zh_CN/scsi/scsi-parameters.rst
new file mode 100644
index 000000000000..ace777e070ea
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/scsi-parameters.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/scsi-parameters.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+============
+SCSI内核参数
+============
+
+请查阅Documentation/admin-guide/kernel-parameters.rst以获取
+指定模块参数相关的通用信息。
+
+当前文档可能不完全是最新和全面的。命令 ``modinfo -p ${modulename}``
+显示了可加载模块的参数列表。可加载模块被加载到内核中后,也会在
+/sys/module/${modulename}/parameters/ 目录下显示其参数。其
+中某些参数可以通过命令
+``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``
+在运行时修改。
+
+::
+
+ advansys= [HW,SCSI]
+ 请查阅 drivers/scsi/advansys.c 文件头部。
+
+ aha152x= [HW,SCSI]
+ 请查阅 Documentation/scsi/aha152x.rst。
+
+ aha1542= [HW,SCSI]
+ 格式:<portbase>[,<buson>,<busoff>[,<dmaspeed>]]
+
+ aic7xxx= [HW,SCSI]
+ 请查阅 Documentation/scsi/aic7xxx.rst。
+
+ aic79xx= [HW,SCSI]
+ 请查阅 Documentation/scsi/aic79xx.rst。
+
+ atascsi= [HW,SCSI]
+ 请查阅 drivers/scsi/atari_scsi.c。
+
+ BusLogic= [HW,SCSI]
+ 请查阅 drivers/scsi/BusLogic.c 文件中
+ BusLogic_ParseDriverOptions()函数前的注释。
+
+ gvp11= [HW,SCSI]
+
+ ips= [HW,SCSI] Adaptec / IBM ServeRAID 控制器
+ 请查阅 drivers/scsi/ips.c 文件头部。
+
+ mac5380= [HW,SCSI]
+ 请查阅 drivers/scsi/mac_scsi.c。
+
+ scsi_mod.max_luns=
+ [SCSI] 最大可探测LUN数。
+ 取值范围为 1 到 2^32-1。
+
+ scsi_mod.max_report_luns=
+ [SCSI] 接收到的最大LUN数。
+ 取值范围为 1 到 16384。
+
+ NCR_D700= [HW,SCSI]
+ 请查阅 drivers/scsi/NCR_D700.c 文件头部。
+
+ ncr5380= [HW,SCSI]
+ 请查阅 Documentation/scsi/g_NCR5380.rst。
+
+ ncr53c400= [HW,SCSI]
+ 请查阅 Documentation/scsi/g_NCR5380.rst。
+
+ ncr53c400a= [HW,SCSI]
+ 请查阅 Documentation/scsi/g_NCR5380.rst。
+
+ ncr53c8xx= [HW,SCSI]
+
+ osst= [HW,SCSI] SCSI磁带驱动
+ 格式:<buffer_size>,<write_threshold>
+ 另请查阅 Documentation/scsi/st.rst。
+
+ scsi_debug_*= [SCSI]
+ 请查阅 drivers/scsi/scsi_debug.c。
+
+ scsi_mod.default_dev_flags=
+ [SCSI] SCSI默认设备标志
+ 格式:<integer>
+
+ scsi_mod.dev_flags=
+ [SCSI] 厂商和型号的黑/白名单条目
+ 格式:<vendor>:<model>:<flags>
+ (flags 为整数值)
+
+ scsi_mod.scsi_logging_level=
+ [SCSI] 日志级别的位掩码
+ 位的定义请查阅 drivers/scsi/scsi_logging.h。
+ 此参数也可以通过sysctl对dev.scsi.logging_level
+ 进行设置(/proc/sys/dev/scsi/logging_level)。
+ 此外,S390-tools软件包提供了一个便捷的
+ ‘scsi_logging_level’ 脚本,可以从以下地址下载:
+ https://github.com/ibm-s390-linux/s390-tools/blob/master/scripts/scsi_logging_level
+
+ scsi_mod.scan= [SCSI] sync(默认)在发现SCSI总线过程中
+ 同步扫描。async在内核线程中异步扫描,允许系统继续
+ 启动流程。none忽略扫描,预期由用户空间完成扫描。
+
+ sim710= [SCSI,HW]
+ 请查阅 drivers/scsi/sim710.c 文件头部。
+
+ st= [HW,SCSI] SCSI磁带参数(缓冲区大小等)
+ 请查阅 Documentation/scsi/st.rst。
+
+ wd33c93= [HW,SCSI]
+ 请查阅 drivers/scsi/wd33c93.c 文件头部。
diff --git a/Documentation/translations/zh_CN/scsi/scsi.rst b/Documentation/translations/zh_CN/scsi/scsi.rst
new file mode 100644
index 000000000000..5d6e39c7cbb5
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/scsi.rst
@@ -0,0 +1,48 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/scsi.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+==============
+SCSI子系统文档
+==============
+
+Linux文档项目(LDP)维护了一份描述Linux内核(lk) 2.4中SCSI
+子系统的文档。请参考:
+https://www.tldp.org/HOWTO/SCSI-2.4-HOWTO 。LDP提供单页和
+多页的HTML版本,以及PostScript与PDF格式的文档。
+
+在SCSI子系统中使用模块的注意事项
+================================
+Linux内核中的SCSI支持可以根据终端用户的需求以不同的方式模块
+化。为了理解你的选择,我们首先需要定义一些术语。
+
+scsi-core(也被称为“中间层”)包含SCSI支持的核心。没有他你将
+无法使用任何其他SCSI驱动程序。SCSI核心支持可以是一个模块(
+scsi_mod.o),也可以编译进内核。如果SCSI核心是一个模块,那么
+他必须是第一个被加载的SCSI模块,如果你将卸载该模块,那么他必
+须是最后一个被卸载的模块。实际上,modprobe和rmmod命令将确保
+SCSI子系统中模块加载与卸载的正确顺序。
+
+一旦SCSI核心存在于内核中(无论是编译进内核还是作为模块加载),
+独立的上层驱动和底层驱动可以按照任意顺序加载。磁盘驱动程序
+(sd_mod.o)、光盘驱动程序(sr_mod.o)、磁带驱动程序 [1]_
+(st.o)以及SCSI通用驱动程序(sg.o)代表了上层驱动,用于控制
+相应的各种设备。例如,你可以加载磁带驱动程序来使用磁带驱动器,
+然后在不需要该驱动程序时卸载他(并释放相关内存)。
+
+底层驱动程序用于支持您所运行硬件平台支持的不同主机卡。这些不同
+的主机卡通常被称为主机总线适配器(HBAs)。例如,aic7xxx.o驱动
+程序被用于控制Adaptec所属的所有最新的SCSI控制器。几乎所有的底
+层驱动都可以被编译为模块或直接编译进内核。
+
+.. [1] 磁带驱动程序有一个变种用于控制OnStream磁带设备。其模块
+ 名称为osst.o 。
diff --git a/Documentation/translations/zh_CN/scsi/scsi_eh.rst b/Documentation/translations/zh_CN/scsi/scsi_eh.rst
new file mode 100644
index 000000000000..26e0f30f0949
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/scsi_eh.rst
@@ -0,0 +1,482 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/scsi_eh.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+===================
+SCSI 中间层错误处理
+===================
+
+本文档描述了SCSI中间层(mid layer)的错误处理基础架构。
+关于SCSI中间层的更多信息,请参阅:
+Documentation/scsi/scsi_mid_low_api.rst。
+
+.. 目录
+
+ [1] SCSI 命令如何通过中间层传递并进入错误处理(EH)
+ [1-1] scsi_cmnd(SCSI命令)结构体
+ [1-2] scmd(SCSI 命令)是如何完成的?
+ [1-2-1] 通过scsi_done完成scmd
+ [1-2-2] 通过超时机制完成scmd
+ [1-3] 错误处理模块如何接管流程
+ [2] SCSI错误处理机制工作原理
+ [2-1] 基于细粒度回调的错误处理
+ [2-1-1] 概览
+ [2-1-2] scmd在错误处理流程中的传递路径
+ [2-1-3] 控制流分析
+ [2-2] 通过transportt->eh_strategy_handler()实现的错误处理
+ [2-2-1] transportt->eh_strategy_handler()调用前的中间层状态
+ [2-2-2] transportt->eh_strategy_handler()调用后的中间层状态
+ [2-2-3] 注意事项
+
+
+1. SCSI命令在中间层及错误处理中的传递流程
+=========================================
+
+1.1 scsi_cmnd结构体
+-------------------
+
+每个SCSI命令都由struct scsi_cmnd(简称scmd)结构体
+表示。scmd包含两个list_head类型的链表节点:scmd->list
+与scmd->eh_entry。其中scmd->list是用于空闲链表或设备
+专属的scmd分配链表,与错误处理讨论关联不大。而
+scmd->eh_entry则是专用于命令完成和错误处理链表,除非
+特别说明,本文讨论中所有scmd的链表操作均通过
+scmd->eh_entry实现。
+
+
+1.2 scmd是如何完成的?
+----------------------
+
+底层设备驱动(LLDD)在获取SCSI命令(scmd)后,存在两种
+完成路径:底层驱动可通过调用hostt->queuecommand()时从
+中间层传递的scsi_done回调函数主动完成命令,或者当命令未
+及时完成时由块层(block layer)触发超时处理机制。
+
+
+1.2.1 通过scsi_done回调完成SCSI命令
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+对于所有非错误处理(EH)命令,scsi_done()是其完成回调
+函数。它只调用blk_mq_complete_request()来删除块层的
+定时器并触发块设备软中断(BLOCK_SOFTIRQ)。
+
+BLOCK_SOFTIRQ会间接调用scsi_complete(),进而调用
+scsi_decide_disposition()来决定如何处理该命令。
+scsi_decide_disposition()会查看scmd->result值和感
+应码数据来决定如何处理命令。
+
+ - SUCCESS
+
+ 调用scsi_finish_command()来处理该命令。该函数会
+ 执行一些维护操作,然后调用scsi_io_completion()来
+ 完成I/O操作。scsi_io_completion()会通过调用
+ blk_end_request及其相关函数来通知块层该请求已完成,
+ 如果发生错误,还会判断如何处理剩余的数据。
+
+ - NEEDS_RETRY
+
+ - ADD_TO_MLQUEUE
+
+ scmd被重新加入到块设备队列中。
+
+ - otherwise
+
+ 调用scsi_eh_scmd_add(scmd)来处理该命令。
+ 关于此函数的详细信息,请参见 [1-3]。
+
+
+1.2.2 scmd超时完成机制
+^^^^^^^^^^^^^^^^^^^^^^
+
+SCSI命令超时处理机制由scsi_timeout()函数实现。
+当发生超时事件时,该函数
+
+ 1. 首先调用可选的hostt->eh_timed_out()回调函数。
+ 返回值可能是以下3种情况之一:
+
+ - ``SCSI_EH_RESET_TIMER``
+ 表示需要延长命令执行时间并重启计时器。
+
+ - ``SCSI_EH_NOT_HANDLED``
+ 表示eh_timed_out()未处理该命令。
+ 此时将执行第2步的处理流程。
+
+ - ``SCSI_EH_DONE``
+ 表示eh_timed_out()已完成该命令。
+
+ 2. 若未通过回调函数解决,系统将调用
+ scsi_abort_command()发起异步中止操作,该操作最多
+ 可执行scmd->allowed + 1次。但存在三种例外情况会跳
+ 过异步中止而直接进入第3步处理:当检测到
+ SCSI_EH_ABORT_SCHEDULED标志位已置位(表明该命令先
+ 前已被中止过一次且当前重试仍失败)、当重试次数已达上
+ 限、或当错误处理时限已到期时。在这些情况下,系统将跳
+ 过异步中止流程而直接执行第3步处理方案。
+
+ 3. 最终未解决的命令会通过scsi_eh_scmd_add(scmd)移交给
+ 错误处理子系统,具体流程详见[1-4]章节说明。
+
+1.3 异步命令中止机制
+--------------------
+
+当命令超时触发后,系统会通过scsi_abort_command()调度异
+步中止操作。若中止操作执行成功,则根据重试次数决定后续处
+理:若未达最大重试限制,命令将重新下发执行;若重试次数已
+耗尽,则命令最终以DID_TIME_OUT状态终止。当中止操作失败
+时,系统会调用scsi_eh_scmd_add()将该命令移交错误处理子
+系统,具体处理流程详见[1-4]。
+
+1.4 错误处理(EH)接管机制
+------------------------
+
+SCSI命令通过scsi_eh_scmd_add()函数进入错误处理流程,该函
+数执行以下操作:
+
+ 1. 将scmd->eh_entry链接到shost->eh_cmd_q
+
+ 2. 在shost->shost_state中设置SHOST_RECOVERY状态位
+
+ 3. 递增shost->host_failed失败计数器
+
+ 4. 当检测到shost->host_busy == shost->host_failed
+ 时(即所有进行中命令均已失败)立即唤醒SCSI错误处理
+ 线程。
+
+如上所述,当任一scmd被加入到shost->eh_cmd_q队列时,系统
+会立即置位shost_state中的SHOST_RECOVERY状态标志位,该操
+作将阻止块层向对应主机控制器下发任何新的SCSI命令。在此状
+态下,主机控制器上所有正在处理的scmd最终会进入以下三种状
+态之一:正常完成、失败后被移入到eh_cmd_q队列、或因超时被
+添加到shost->eh_cmd_q队列。
+
+如果所有的SCSI命令都已经完成或失败,系统中正在执行的命令
+数量与失败命令数量相等(
+即shost->host_busy == shost->host_failed),此时将唤
+醒SCSI错误处理线程。SCSI错误处理线程一旦被唤醒,就可以确
+保所有未完成命令均已标记为失败状态,并且已经被链接到
+shost->eh_cmd_q队列中。
+
+需要特别说明的是,这并不意味着底层处理流程完全静止。当底层
+驱动以错误状态完成某个scmd时,底层驱动及其下层组件会立刻遗
+忘该命令的所有关联状态。但对于超时命令,除非
+hostt->eh_timed_out()回调函数已经明确通知底层驱动丢弃该
+命令(当前所有底层驱动均未实现此功能),否则从底层驱动视角
+看该命令仍处于活跃状态,理论上仍可能在某时刻完成。当然,由
+于超时计时器早已触发,所有此类延迟完成都将被系统直接忽略。
+
+我们将在后续章节详细讨论关于SCSI错误处理如何执行中止操作(
+即强制底层驱动丢弃已超时SCSI命令)。
+
+
+2. SCSI错误处理机制详解
+=======================
+
+SCSI底层驱动可以通过以下两种方式之一来实现SCSI错误处理。
+
+ - 细粒度的错误处理回调机制
+ 底层驱动可选择实现细粒度的错误处理回调函数,由SCSI中间层
+ 主导错误恢复流程并自动调用对应的回调函数。此实现模式的详
+ 细设计规范在[2-1]节中展开讨论。
+
+ - eh_strategy_handler()回调函数
+ 该回调函数作为统一的错误处理入口,需要完整实现所有的恢复
+ 操作。具体而言,它必须涵盖SCSI中间层在常规恢复过程中执行
+ 的全部处理流程,相关实现将在[2-2]节中详细描述。
+
+当错误恢复流程完成后,SCSI错误处理系统通过调用
+scsi_restart_operations()函数恢复正常运行,该函数按顺序执行
+以下操作:
+
+ 1. 验证是否需要执行驱动器安全门锁定机制
+
+ 2. 清除shost_state中的SHOST_RECOVERY状态标志位
+
+ 3. 唤醒所有在shost->host_wait上等待的任务。如果有人调用了
+ scsi_block_when_processing_errors()则会发生这种情况。
+ (疑问:由于错误处理期间块层队列已被阻塞,为何仍需显式
+ 唤醒?)
+
+ 4. 强制激活该主机控制器下所有设备的I/O队列
+
+
+2.1 基于细粒度回调的错误处理机制
+--------------------------------
+
+2.1.1 概述
+^^^^^^^^^^^
+
+如果不存在eh_strategy_handler(),SCSI中间层将负责驱动的
+错误处理。错误处理(EH)的目标有两个:一是让底层驱动程序、
+主机和设备不再维护已超时的SCSI命令(scmd);二是使他们准备
+好接收新命令。当一个SCSI命令(scmd)被底层遗忘且底层已准备
+好再次处理或拒绝该命令时,即可认为该scmd已恢复。
+
+为实现这些目标,错误处理(EH)会逐步执行严重性递增的恢复
+操作。部分操作通过下发SCSI命令完成,而其他操作则通过调用
+以下细粒度的错误处理回调函数实现。这些回调函数可以省略,
+若被省略则默认始终视为执行失败。
+
+::
+
+ int (* eh_abort_handler)(struct scsi_cmnd *);
+ int (* eh_device_reset_handler)(struct scsi_cmnd *);
+ int (* eh_bus_reset_handler)(struct scsi_cmnd *);
+ int (* eh_host_reset_handler)(struct scsi_cmnd *);
+
+只有在低级别的错误恢复操作无法恢复部分失败的SCSI命令
+(scmd)时,才会采取更高级别的恢复操作。如果最高级别的错误
+处理失败,就意味着整个错误恢复(EH)过程失败,所有未能恢复
+的设备被强制下线。
+
+在恢复过程中,需遵循以下规则:
+
+ - 错误恢复操作针对待处理列表eh_work_q中的失败的scmds执
+ 行。如果某个恢复操作成功恢复了一个scmd,那么该scmd会
+ 从eh_work_q链表中移除。
+
+ 需要注意的是,对某个scmd执行的单个恢复操作可能会恢复
+ 多个scmd。例如,对某个设备执行复位操作可能会恢复该设
+ 备上所有失败的scmd。
+
+ - 仅当低级别的恢复操作完成且eh_work_q仍然非空时,才会
+ 触发更高级别的操作
+
+ - SCSI错误恢复机制会重用失败的scmd来发送恢复命令。对于
+ 超时的scmd,SCSI错误处理机制会确保底层驱动在重用scmd
+ 前已不再维护该命令。
+
+当一个SCSI命令(scmd)被成功恢复后,错误处理逻辑会通过
+scsi_eh_finish_cmd()将其从待处理队列(eh_work_q)移
+至错误处理的本地完成队列(eh_done_q)。当所有scmd均恢
+复完成(即eh_work_q为空时),错误处理逻辑会调用
+scsi_eh_flush_done_q()对这些已恢复的scmd进行处理,即
+重新尝试或错误总终止(向上层通知失败)。
+
+SCSI命令仅在满足以下全部条件时才会被重试:对应的SCSI设
+备仍处于在线状态,未设置REQ_FAILFAST标志或递增后的
+scmd->retries值仍小于scmd->allowed。
+
+2.1.2 SCSI命令在错误处理过程中的流转路径
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ 1. 错误完成/超时
+
+ :处理: 调用scsi_eh_scmd_add()处理scmd
+
+ - 将scmd添加到shost->eh_cmd_q
+ - 设置SHOST_RECOVERY标记位
+ - shost->host_failed++
+
+ :锁要求: shost->host_lock
+
+ 2. 启动错误处理(EH)
+
+ :操作: 将所有scmd移动到EH本地eh_work_q队列,并
+ 清空 shost->eh_cmd_q。
+
+ :锁要求: shost->host_lock(非严格必需,仅为保持一致性)
+
+ 3. scmd恢复
+
+ :操作: 调用scsi_eh_finish_cmd()完成scmd的EH
+
+ - 将scmd从本地eh_work_q队列移至本地eh_done_q队列
+
+ :锁要求: 无
+
+ :并发控制: 每个独立的eh_work_q至多一个线程,确保无锁
+ 队列的访问
+
+ 4. EH完成
+
+ :操作: 调用scsi_eh_flush_done_q()重试scmd或通知上层处理
+ 失败。此函数可以被并发调用,但每个独立的eh_work_q队
+ 列至多一个线程,以确保无锁队列的访问。
+
+ - 从eh_done_q队列中移除scmd,清除scmd->eh_entry
+ - 如果需要重试,调用scsi_queue_insert()重新入队scmd
+ - 否则,调用scsi_finish_command()完成scmd
+ - 将shost->host_failed置为零
+
+ :锁要求: 队列或完成函数会执行适当的加锁操作
+
+
+2.1.3 控制流
+^^^^^^^^^^^^
+
+ 通过细粒度回调机制执行的SCSI错误处理(EH)是从
+ scsi_unjam_host()函数开始的
+
+``scsi_unjam_host``
+
+ 1. 持有shost->host_lock锁,将shost->eh_cmd_q中的命令移动
+ 到本地的eh_work_q队里中,并释放host_lock锁。注意,这一步
+ 会清空shost->eh_cmd_q。
+
+ 2. 调用scsi_eh_get_sense函数。
+
+ ``scsi_eh_get_sense``
+
+ 该操作针对没有有效感知数据的错误完成命令。大部分SCSI传输协议
+ 或底层驱动在命令失败时会自动获取感知数据(自动感知)。出于性
+ 能原因,建议使用自动感知,推荐使用自动感知机制,因为它不仅有
+ 助于提升性能,还能避免从发生CHECK CONDITION到执行本操作之间,
+ 感知信息出现不同步的问题。
+
+ 注意,如果不支持自动感知,那么在使用scsi_done()以错误状态完成
+ scmd 时,scmd->sense_buffer将包含无效感知数据。在这种情况下,
+ scsi_decide_disposition()总是返回FAILED从而触发SCSI错误处理
+ (EH)。当该scmd执行到这里时,会重新获取感知数据,并再次调用
+ scsi_decide_disposition()进行处理。
+
+ 1. 调用scsi_request_sense()发送REQUEST_SENSE命令。如果失败,
+ 则不采取任何操作。请注意,不采取任何操作会导致对该scmd执行
+ 更高级别的恢复操作。
+
+ 2. 调用scsi_decide_disposition()处理scmd
+
+ - SUCCESS
+ scmd->retries被设置为scmd->allowed以防止
+ scsi_eh_flush_done_q()重试该scmd,并调用
+ scsi_eh_finish_cmd()。
+
+ - NEEDS_RETRY
+ 调用scsi_eh_finish_cmd()
+
+ - 其他情况
+ 无操作。
+
+ 4. 如果!list_empty(&eh_work_q),则调用scsi_eh_ready_devs()。
+
+ ``scsi_eh_ready_devs``
+
+ 该函数采取四种逐步增强的措施,使失败的设备准备好处理新的命令。
+
+ 1. 调用scsi_eh_stu()
+
+ ``scsi_eh_stu``
+
+ 对于每个具有有效感知数据且scsi_check_sense()判断为失败的
+ scmd发送START STOP UNIT(STU)命令且将start置1。注意,由
+ 于我们明确选择错误完成的scmd,可以确定底层驱动已不再维护该
+ scmd,我们可以重用它进行STU。
+
+ 如果STU操作成功且sdev处于离线或就绪状态,所有在sdev上失败的
+ scmd都会通过scsi_eh_finish_cmd()完成。
+
+ *注意* 如果hostt->eh_abort_handler()未实现或返回失败,可能
+ 此时仍有超时的scmd,此时STU不会导致底层驱动不再维护scmd。但
+ 是,如果STU执行成功,该函数会通过scsi_eh_finish_cmd()来完成
+ sdev上的所有scmd,这会导致底层驱动处于不一致的状态。看来STU
+ 操作应仅在sdev不包含超时scmd时进行。
+
+ 2. 如果!list_empty(&eh_work_q),调用scsi_eh_bus_device_reset()。
+
+ ``scsi_eh_bus_device_reset``
+
+ 此操作与scsi_eh_stu()非常相似,区别在于使用
+ hostt->eh_device_reset_handler()替代STU命令。此外,由于我们
+ 没有发送SCSI命令且重置会清空该sdev上所有的scmd,所以无需筛选错
+ 误完成的scmd。
+
+ 3. 如果!list_empty(&eh_work_q),调用scsi_eh_bus_reset()。
+
+ ``scsi_eh_bus_reset``
+
+ 对于每个包含失败scmd的SCSI通道调用
+ hostt->eh_bus_reset_handler()。如果总线重置成功,那么该通道上
+ 所有准备就绪或离线状态sdev上的失败scmd都会被处理处理完成。
+
+ 4. 如果!list_empty(&eh_work_q),调用scsi_eh_host_reset()。
+
+ ``scsi_eh_host_reset``
+
+ 调用hostt->eh_host_reset_handler()是最终的手段。如果SCSI主机
+ 重置成功,主机上所有就绪或离线sdev上的失败scmd都会通过错误处理
+ 完成。
+
+ 5. 如果!list_empty(&eh_work_q),调用scsi_eh_offline_sdevs()。
+
+ ``scsi_eh_offline_sdevs``
+
+ 离线所有包含未恢复scmd的所有sdev,并通过
+ scsi_eh_finish_cmd()完成这些scmd。
+
+ 5. 调用scsi_eh_flush_done_q()。
+
+ ``scsi_eh_flush_done_q``
+
+ 此时所有的scmd都已经恢复(或放弃),并通过
+ scsi_eh_finish_cmd()函数加入eh_done_q队列。该函数通过
+ 重试或显示通知上层scmd的失败来刷新eh_done_q。
+
+
+2.2 基于transportt->eh_strategy_handler()的错误处理机制
+-------------------------------------------------------------
+
+在该机制中,transportt->eh_strategy_handler()替代
+scsi_unjam_host()的被调用,并负责整个错误恢复过程。该处理
+函数完成后应该确保底层驱动不再维护任何失败的scmd并且将设备
+设置为就绪(准备接收新命令)或离线状态。此外,该函数还应该
+执行SCSI错误处理的维护任务,以维护SCSI中间层的数据完整性。
+换句话说,eh_strategy_handler()必须实现[2-1-2]中除第1步
+外的所有步骤。
+
+
+2.2.1 transportt->eh_strategy_handler()调用前的SCSI中间层状态
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ 进入该处理函数时,以下条件成立。
+
+ - 每个失败的scmd的eh_flags字段已正确设置。
+
+ - 每个失败的scmd通过scmd->eh_entry链接到scmd->eh_cmd_q队列。
+
+ - 已设置SHOST_RECOVERY标志。
+
+ - `shost->host_failed == shost->host_busy`。
+
+2.2.2 transportt->eh_strategy_handler()调用后的SCSI中间层状态
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ 从该处理函数退出时,以下条件成立。
+
+ - shost->host_failed为零。
+
+ - shost->eh_cmd_q被清空。
+
+ - 每个scmd->eh_entry被清空。
+
+ - 对每个scmd必须调用scsi_queue_insert()或scsi_finish_command()。
+ 注意,该处理程序可以使用scmd->retries(剩余重试次数)和
+ scmd->allowed(允许重试次数)限制重试次数。
+
+
+2.2.3 注意事项
+^^^^^^^^^^^^^^
+
+ - 需明确已超时的scmd在底层仍处于活跃状态,因此在操作这些
+ scmd前,必须确保底层已彻底不再维护。
+
+ - 访问或修改shost数据结构时,必须持有shost->host_lock锁
+ 以维持数据一致性。
+
+ - 错误处理完成后,每个故障设备必须彻底清除所有活跃SCSI命
+ 令(scmd)的关联状态。
+
+ - 错误处理完成后,每个故障设备必须被设置为就绪(准备接收
+ 新命令)或离线状态。
+
+
+Tejun Heo
+htejun@gmail.com
+
+11th September 2005
diff --git a/Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst b/Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst
new file mode 100644
index 000000000000..f701945a1b1c
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst
@@ -0,0 +1,1174 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/scsi_mid_low_api.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+=========================
+SCSI中间层 — 底层驱动接口
+=========================
+
+简介
+====
+本文档概述了Linux SCSI中间层与SCSI底层驱动之间的接口。底层
+驱动(LLD)通常被称为主机总线适配器(HBA)驱动或主机驱动
+(HD)。在该上下文中,“主机”指的是计算机IO总线(例如:PCI总
+线或ISA总线)与SCSI传输层中单个SCSI启动器端口之间的桥梁。
+“启动器”端口(SCSI术语,参考SAM-3:http://www.t10.org)向
+“目标”SCSI端口(例如:磁盘)发送SCSI命令。在一个运行的系统
+中存在多种底层驱动(LLDs),但每种硬件类型仅对应一种底层驱动
+(LLD)。大多数底层驱动可以控制一个或多个SCSI HBA。部分HBA
+内部集成多个主机控制器。
+
+在某些情况下,SCSI传输层本身是已存在于Linux中的外部总线子系
+统(例如:USB和ieee1394)。在此类场景下,SCSI子系统的底层驱
+动将作为与其他驱动子系统的软件桥接层。典型示例包括
+usb-storage驱动(位于drivers/usb/storage目录)以
+及ieee1394/sbp2驱动(位于 drivers/ieee1394 目录)。
+
+例如,aic7xxx底层驱动负责控制基于Adaptec公司7xxx芯片系列的
+SCSI并行接口(SPI)控制器。aic7xxx底层驱动可以内建到内核中
+或作为模块加载。一个Linux系统中只能运行一个aic7xxx底层驱动
+程序,但他可能控制多个主机总线适配器(HBA)。这些HBA可能位于
+PCI扩展卡或内置于主板中(或两者兼有)。某些基于aic7xxx的HBA
+采用双控制器设计,因此会呈现为两个SCSI主机适配器。与大多数现
+代HBA相同,每个aic7xxx控制器都拥有其独立的PCI设备地址。[SCSI
+主机与PCI设备之间一一对应虽然常见,但并非强制要求(例如ISA适
+配器就不适用此规则)。]
+
+SCSI中间层将SCSI底层驱动(LLD)与其他层(例如SCSI上层驱动以
+及块层)隔离开来。
+
+本文档的版本大致与Linux内核2.6.8相匹配。
+
+文档
+====
+内核源码树中设有专用的SCSI文档目录,通常位于
+Documentation/scsi目录下。大多数文档采用
+reStructuredText格式。本文档名为
+scsi_mid_low_api.rst,可在该目录中找到。该文档的最新版本可
+以访问 https://docs.kernel.org/scsi/scsi_mid_low_api.html
+查阅。许多底层驱动(LLD)的文档也位于Documentation/scsi目录
+下(例如aic7xxx.rst)。SCSI中间层的简要说明见scsi.rst文件,
+该文档包含指向Linux Kernel 2.4系列SCSI子系统的文档链接。此
+外还收录了两份SCSI上层驱动文档:st.rst(SCSI磁带驱动)与
+scsi-generic.rst(用通用SCSI(sg)驱动)。
+
+部分底层驱动的文档(或相关URL)可能嵌在C源代码文件或与其
+源码同位于同一目录下。例如,USB大容量存储驱动的文档链接可以在
+目录/usr/src/linux/drivers/usb/storage下找到。
+
+驱动程序结构
+============
+传统上,SCSI子系统的底层驱动(LLD)至少包含drivers/scsi
+目录下的两个文件。例如,一个名为“xyz”的驱动会包含一个头文件
+xyz.h和一个源文件xyz.c。[实际上所有代码完全可以合并为单个
+文件,头文件并非必需的。] 部分需要跨操作系统移植的底层驱动会
+采用更复杂的文件结构。例如,aic7xxx驱动,就为通用代码与操作
+系统专用代码(如FreeBSD和Linux)分别创建了独立的文件。此类
+驱动通常会在drivers/scsi目录下拥有自己单独的子目录。
+
+当需要向Linux内核添加新的底层驱动(LLD)时,必须留意
+drivers/scsi目录下的两个文件:Makefile以及Kconfig。建议参
+考现有底层驱动的代码组织方式。
+
+随着Linux内核2.5开发内核逐步演进为2.6系列的生产版本,该接口
+也引入了一些变化。以驱动初始化代码为例,现有两种模型可用。其
+中旧模型与Linux内核2.4的实现相似,他基于在加载HBA驱动时检测
+到的主机,被称为“被动(passive)”初始化模型。而新的模型允许
+在底层驱动(LLD)的生命周期内动态拔插HBA,这种方式被称为“热
+插拔(hotplug)”初始化模型。推荐使用新的模型,因为他既能处理
+传统的永久连接SCSI设备,也能处理现代支持热插拔的类SCSI设备
+(例如通过USB或IEEE 1394连接的数码相机)。这两种初始化模型将
+在后续的章节中分别讨论。
+
+SCSI底层驱动(LLD)通过以下3种方式与SCSI子系统进行交互:
+
+ a) 直接调用由SCSI中间层提供的接口函数
+ b) 将一组函数指针传递给中间层提供的注册函数,中间层将在
+ 后续运行的某个时刻调用这些函数。这些函数由LLD实现。
+ c) 直接访问中间层维护的核心数据结构
+
+a)组中所涉及的所有函数,均列于下文“中间层提供的函数”章节中。
+
+b)组中涉及的所有函数均列于下文名为“接口函数”的章节中。这些
+函数指针位于结构体struct scsi_host_template中,该结构体实
+例会被传递给scsi_host_alloc()。对于LLD未实现的接口函数,应
+对struct scsi_host_template中的对应成员赋NULL。如果在文件
+作用域定义一个struct scsi_host_template的实例,没有显式初
+始化的函数指针成员将自动设置为NULL。
+
+c)组中提到的用法在“热插拔”环境中尤其需要谨慎处理。LLD必须
+明确知晓这些与中间层及其他层级共享的数据结构的生命周期。
+
+LLD中定义的所有函数以及在文件作用域内定义的所有数据都应声明
+为static。例如,在一个名为“xxx”的LLD中的sdev_init()函数定
+义如下:
+``static int xxx_sdev_init(struct scsi_device * sdev) { /* code */ }``
+
+热插拔初始化模型
+================
+在该模型中,底层驱动(LLD)控制SCSI主机适配器在子系统中的注
+册与注销时机。主机最早可以在驱动初始化阶段被注册,最晚可以在
+驱动卸载时被移除。通常,驱动会响应来自sysfs probe()的回调,
+表示已检测到一个主机总线适配器(HBA)。在确认该新设备是LLD的
+目标设备后,LLD初始化HBA,并将一个新的SCSI主机适配器注册到
+SCSI中间层。
+
+在LLD初始化过程中,驱动应当向其期望发现HBA的IO总线(例如PCI
+总线)进行注册。该操作通常可以通过sysfs完成。任何驱动参数(
+特别是那些在驱动加载后仍可修改的参数)也可以在此时通过sysfs
+注册。当LLD注册其首个HBA时,SCSI中间层首次感受到该LLD的存在。
+
+在稍后的某个时间点,当LLD检测到新的HBA时,接下来在LLD与SCSI
+中间层之间会发生一系列典型的调用过程。该示例展示了中间层如何
+扫描新引入的HBA,在该过程中发现了3个SCSI设备,其中只有前两个
+设备有响应::
+
+ HBA探测:假设在扫描中发现2个SCSI设备
+ 底层驱动 中间层 底层驱动
+ =======---------------======---------------=======
+ scsi_host_alloc() -->
+ scsi_add_host() ---->
+ scsi_scan_host() -------+
+ |
+ sdev_init()
+ sdev_configure() --> scsi_change_queue_depth()
+ |
+ sdev_init()
+ sdev_configure()
+ |
+ sdev_init() ***
+ sdev_destroy() ***
+
+
+ *** 对于SCSI中间层尝试扫描但未响应的SCSI设备,系统调用
+ sdev_init()和sdev_destroy()函数对。
+
+如果LLD期望调整默认队列设置,可以在其sdev_configure()例程
+中调用scsi_change_queue_depth()。
+
+当移除一个HBA时,可能是由于卸载LLD模块相关的有序关闭(例如通
+过rmmod命令),也可能是由于sysfs的remove()回调而触发的“热拔
+插”事件。无论哪种情况,其执行顺序都是相同的::
+
+ HBA移除:假设连接了2个SCSI设备
+ 底层驱动 中间层 底层驱动
+ =======---------------------======-----------------=======
+ scsi_remove_host() ---------+
+ |
+ sdev_destroy()
+ sdev_destroy()
+ scsi_host_put()
+
+LLD用于跟踪struct Scsi_Host的实例可能会非常有用
+(scsi_host_alloc()返回的指针)。这些实例由中间层“拥有”。
+当引用计数为零时,struct Scsi_Host实例会被
+scsi_host_put()释放。
+
+HBA的热插拔是一个特殊的场景,特别是当HBA下的磁盘正在处理已挂
+载文件系统上的SCSI命令时。为了应对其中的诸多问题,中间层引入
+了引用计数逻辑。具体内容参考下文关于引用计数的章节。
+
+热插拔概念同样适用于SCSI设备。目前,当添加HBA时,
+scsi_scan_host() 函数会扫描该HBA所属SCSI传输通道上的设备。在
+新型SCSI传输协议中,HBA可能在扫描完成后才检测到新的SCSI设备。
+LLD可通过以下步骤通知中间层新SCSI设备的存在::
+
+ SCSI设备热插拔
+ 底层驱动 中间层 底层驱动
+ =======-------------------======-----------------=======
+ scsi_add_device() ------+
+ |
+ sdev_init()
+ sdev_configure() [--> scsi_change_queue_depth()]
+
+类似的,LLD可能会感知到某个SCSI设备已经被移除(拔出)或与他的连
+接已中断。某些现有的SCSI传输协议(例如SPI)可能直到后续SCSI命令
+执行失败时才会发现设备已经被移除,中间层会将该设备设置为离线状态。
+若LLD检测到SCSI设备已经被移除,可通过以下流程触发上层对该设备的
+移除操作::
+
+ SCSI设备热拔插
+ 底层驱动 中间层 底层驱动
+ =======-------------------======-----------------=======
+ scsi_remove_device() -------+
+ |
+ sdev_destroy()
+
+对于LLD而言,跟踪struct scsi_device实例可能会非常有用(该结构
+的指针会作为参数传递给sdev_init()和sdev_configure()回调函数)。
+这些实例的所有权归属于中间层(mid-level)。struct scsi_device
+实例在sdev_destroy()执行后释放。
+
+引用计数
+========
+Scsi_Host结构体已引入引用计数机制。该机制将struct Scsi_Host
+实例的所有权分散到使用他的各SCSI层,而此前这类实例完全由中间
+层独占管理。底层驱动(LLD)通常无需直接操作这些引用计数,仅在
+某些特定场景下可能需要介入。
+
+与struct Scsi_Host相关的引用计数函数主要有以下3种:
+
+ - scsi_host_alloc():
+ 返回指向新实例的指针,该实例的引用计数被设置为1。
+
+ - scsi_host_get():
+ 给定实例的引用计数加1。
+
+ - scsi_host_put():
+ 给定实例的引用计数减1。如果引用计数减少到0,则释放该实例。
+
+scsi_device结构体现已引入引用计数机制。该机制将
+struct scsi_device实例的所有权分散到使用他的各SCSI层,而此
+前这类实例完全由中间层独占管理。相关访问函数声明详见
+include/scsi/scsi_device.h文件末尾部分。若LLD需要保留
+scsi_device实例的指针副本,则应调用scsi_device_get()增加其
+引用计数;不再需要该指针时,可通过scsi_device_put()递减引用
+计数(该操作可能会导致该实例被释放)。
+
+.. Note::
+
+ struct Scsi_Host实际上包含两个并行维护的引用计数器,该引
+ 用计数由这些函数共同操作。
+
+编码规范
+========
+
+首先,Linus Torvalds关于C语言编码风格的观点可以在
+Documentation/process/coding-style.rst文件中找到。
+
+此外,在相关gcc编译器支持的前提下,鼓励使用大多数C99标准的增强
+特性。因此,在适当的情况下鼓励使用C99风格的结构体和数组初始化
+方式。但不要过度使用,目前对可变长度数组(VLA)的支持还待完善。
+一个例外是 ``//`` 风格的注释;在Linux中倾向于使
+用 ``/*...*/`` 注释格式。
+
+对于编写良好、经过充分测试且有完整文档的代码不需要重新格式化
+以符合上述规范。例如,aic7xxx驱动是从FreeBSD和Adaptec代码库
+移植到Linux的。毫无疑问,FreeBSD和Adaptec遵循其原有的编码规
+范。
+
+
+中间层提供的函数
+================
+这些函数由SCSI中间层提供,供底层驱动(LLD)调用。这些函数的名
+称(即入口点)均已导出,因此作为模块加载的LLD可以访问他们。内
+核会确保在任何LLD初始化之前,SCSI中间层已先行加载并完成初始化。
+下文按字母顺序列出这些函数,其名称均以 ``scsi_`` 开头。
+
+摘要:
+
+ - scsi_add_device - 创建新的SCSI逻辑单元(LU)设备实例
+ - scsi_add_host - 执行sysfs注册并设置传输类
+ - scsi_change_queue_depth - 调整SCSI设备队列深度
+ - scsi_bios_ptable - 返回块设备分区表的副本
+ - scsi_block_requests - 阻止向指定主机提交新命令
+ - scsi_host_alloc - 分配引用计数为1的新SCSI主机适配器实例scsi_host
+ - scsi_host_get - 增加SCSI主机适配器实例的引用计数
+ - scsi_host_put - 减少SCSI主机适配器的引用计数(归零时释放)
+ - scsi_remove_device - 卸载并移除SCSI设备
+ - scsi_remove_host - 卸载并移除主机控制器下的所有SCSI设备
+ - scsi_report_bus_reset - 报告检测到的SCSI总线复位事件
+ - scsi_scan_host - 执行SCSI总线扫描
+ - scsi_track_queue_full - 跟踪连续出现的队列满事件
+ - scsi_unblock_requests - 恢复向指定主机提交命令
+
+详细信息::
+
+ /**
+ * scsi_add_device - 创建新的SCSI逻辑单元(LU)设备实例
+ * @shost: 指向SCSI主机适配器实例的指针
+ * @channel: 通道号(通常为0)
+ * @id: 目标ID号
+ * @lun: 逻辑单元号(LUN)
+ *
+ * 返回指向新的struct scsi_device实例的指针,
+ * 如果出现异常(例如在给定地址没有设备响应),则返
+ * 回ERR_PTR(-ENODEV)
+ *
+ * 是否阻塞:是
+ *
+ * 注意事项:本函数通常在添加HBA的SCSI总线扫描过程
+ * 中由系统内部调用(即scsi_scan_host()执行期间)。因此,
+ * 仅应在以下情况调用:HBA在scsi_scan_host()完成扫描后,
+ * 又检测到新的SCSI设备(逻辑单元)。若成功执行,本次调用
+ * 可能会触发LLD的以下回调函数:sdev_init()以及
+ * sdev_configure()
+ *
+ * 函数定义:drivers/scsi/scsi_scan.c
+ **/
+ struct scsi_device * scsi_add_device(struct Scsi_Host *shost,
+ unsigned int channel,
+ unsigned int id, unsigned int lun)
+
+
+ /**
+ * scsi_add_host - 执行sysfs注册并设置传输类
+ * @shost: 指向SCSI主机适配器实例的指针
+ * @dev: 指向scsi类设备结构体(struct device)的指针
+ *
+ * 成功返回0,失败返回负的errno(例如:-ENOMEM)
+ *
+ * 是否阻塞:否
+ *
+ * 注意事项:仅在“热插拔初始化模型”中需要调用,且必须在
+ * scsi_host_alloc()成功执行后调用。该函数不会扫描总线;
+ * 总线扫描可通过调用scsi_scan_host()或其他传输层特定的
+ * 方法完成。在调用该函数之前,LLD必须先设置好传输模板,
+ * 并且只能在调用该函数之后才能访问传输类
+ * (transport class)相关的数据结构。
+ *
+ * 函数定义:drivers/scsi/hosts.c
+ **/
+ int scsi_add_host(struct Scsi_Host *shost, struct device * dev)
+
+
+ /**
+ * scsi_change_queue_depth - 调整SCSI设备队列深度
+ * @sdev: 指向要更改队列深度的SCSI设备的指针
+ * @tags 如果启用了标记队列,则表示允许的标记数,
+ * 或者在非标记模式下,LLD可以排队的命令
+ * 数(如 cmd_per_lun)。
+ *
+ * 无返回
+ *
+ * 是否阻塞:否
+ *
+ * 注意事项:可以在任何时刻调用该函数,只要该SCSI设备受该LLD控
+ * 制。[具体来说,可以在sdev_configure()执行期间或之后,且在
+ * sdev_destroy()执行之前调用。] 该函数可安全地在中断上下文中
+ * 调用。
+ *
+ * 函数定义:drivers/scsi/scsi.c [更多注释请参考源代码]
+ **/
+ int scsi_change_queue_depth(struct scsi_device *sdev, int tags)
+
+
+ /**
+ * scsi_bios_ptable - 返回块设备分区表的副本
+ * @dev: 指向块设备的指针
+ *
+ * 返回指向分区表的指针,失败返回NULL
+ *
+ * 是否阻塞:是
+ *
+ * 注意事项:调用方负责释放返回的内存(通过 kfree() 释放)
+ *
+ * 函数定义:drivers/scsi/scsicam.c
+ **/
+ unsigned char *scsi_bios_ptable(struct block_device *dev)
+
+
+ /**
+ * scsi_block_requests - 阻止向指定主机提交新命令
+ *
+ * @shost: 指向特定主机的指针,用于阻止命令的发送
+ *
+ * 无返回
+ *
+ * 是否阻塞:否
+ *
+ * 注意事项:没有定时器或其他任何机制可以解除阻塞,唯一的方式
+ * 是由LLD调用scsi_unblock_requests()方可恢复。
+ *
+ * 函数定义:drivers/scsi/scsi_lib.c
+ **/
+ void scsi_block_requests(struct Scsi_Host * shost)
+
+
+ /**
+ * scsi_host_alloc - 创建SCSI主机适配器实例并执行基础初始化
+ * @sht: 指向SCSI主机模板的指针
+ * @privsize: 在hostdata数组中分配的额外字节数(该数组是返
+ * 回的Scsi_Host实例的最后一个成员)
+ *
+ * 返回指向新的Scsi_Host实例的指针,失败返回NULL
+ *
+ * 是否阻塞:是
+ *
+ * 注意事项:当此调用返回给LLD时,该主机适配器上的
+ * SCSI总线扫描尚未进行。hostdata数组(默认长度为
+ * 零)是LLD专属的每主机私有区域,供LLD独占使用。
+ * 两个相关的引用计数都被设置为1。完整的注册(位于
+ * sysfs)与总线扫描由scsi_add_host()和
+ * scsi_scan_host()稍后执行。
+ * 函数定义:drivers/scsi/hosts.c
+ **/
+ struct Scsi_Host * scsi_host_alloc(const struct scsi_host_template * sht,
+ int privsize)
+
+
+ /**
+ * scsi_host_get - 增加SCSI主机适配器实例的引用计数
+ * @shost: 指向Scsi_Host实例的指针
+ *
+ * 无返回
+ *
+ * 是否阻塞:目前可能会阻塞,但可能迭代为不阻塞
+ *
+ * 注意事项:会同时增加struct Scsi_Host中两个子对
+ * 象的引用计数
+ *
+ * 函数定义:drivers/scsi/hosts.c
+ **/
+ void scsi_host_get(struct Scsi_Host *shost)
+
+
+ /**
+ * scsi_host_put - 减少SCSI主机适配器实例的引用计数
+ * (归零时释放)
+ * @shost: 指向Scsi_Host实例的指针
+ *
+ * 无返回
+ *
+ * 是否阻塞:当前可能会阻塞,但可能会改为不阻塞
+ *
+ * 注意事项:实际会递减两个子对象中的计数。当后一个引用
+ * 计数归零时系统会自动释放Scsi_Host实例。
+ * LLD 无需关注Scsi_Host实例的具体释放时机,只要在平衡
+ * 引用计数使用后不再访问该实例即可。
+ * 函数定义:drivers/scsi/hosts.c
+ **/
+ void scsi_host_put(struct Scsi_Host *shost)
+
+
+ /**
+ * scsi_remove_device - 卸载并移除SCSI设备
+ * @sdev: 指向SCSI设备实例的指针
+ *
+ * 返回值:成功返回0,若设备未连接,则返回-EINVAL
+ *
+ * 是否阻塞:是
+ *
+ * 如果LLD发现某个SCSI设备(逻辑单元,lu)已经被移除,
+ * 但其主机适配器实例依旧存在,则可以请求移除该SCSI设备。
+ * 如果该调用成功将触发sdev_destroy()回调函数的执行。调
+ * 用完成后,sdev将变成一个无效的指针。
+ *
+ * 函数定义:drivers/scsi/scsi_sysfs.c
+ **/
+ int scsi_remove_device(struct scsi_device *sdev)
+
+
+ /**
+ * scsi_remove_host - 卸载并移除主机控制器下的所有SCSI设备
+ * @shost: 指向SCSI主机适配器实例的指针
+ *
+ * 返回值:成功返回0,失败返回1(例如:LLD正忙??)
+ *
+ * 是否阻塞:是
+ *
+ * 注意事项:仅在使用“热插拔初始化模型”时调用。应在调用
+ * scsi_host_put()前调用。
+ *
+ * 函数定义:drivers/scsi/hosts.c
+ **/
+ int scsi_remove_host(struct Scsi_Host *shost)
+
+
+ /**
+ * scsi_report_bus_reset - 报告检测到的SCSI总线复位事件
+ * @shost: 指向关联的SCSI主机适配器的指针
+ * @channel: 发生SCSI总线复位的通道号
+ *
+ * 返回值:无
+ *
+ * 是否阻塞:否
+ *
+ * 注意事项:仅当复位来自未知来源时才需调用此函数。
+ * 由SCSI中间层发起的复位无需调用,但调用也不会导
+ * 致副作用。此函数的主要作用是确保系统能正确处理
+ * CHECK_CONDITION状态。
+ *
+ * 函数定义:drivers/scsi/scsi_error.c
+ **/
+ void scsi_report_bus_reset(struct Scsi_Host * shost, int channel)
+
+
+ /**
+ * scsi_scan_host - 执行SCSI总线扫描
+ * @shost: 指向SCSI主机适配器实例的指针
+ *
+ * 是否阻塞:是
+ *
+ * 注意事项:应在调用scsi_add_host()后调用
+ *
+ * 函数定义:drivers/scsi/scsi_scan.c
+ **/
+ void scsi_scan_host(struct Scsi_Host *shost)
+
+
+ /**
+ * scsi_track_queue_full - 跟踪指定设备上连续的QUEUE_FULL
+ * 事件,以判断是否需要及何时调整
+ * 该设备的队列深度。
+ * @sdev: 指向SCSI设备实例的指针
+ * @depth: 当前该设备上未完成的SCSI命令数量(不包括返回
+ * QUEUE_FULL的命令)
+ *
+ * 返回值:0 - 当前队列深度无需调整
+ * >0 - 需要将队列深度调整为此返回值指定的新深度
+ * -1 - 需要回退到非标记操作模式,并使用
+ * host->cmd_per_lun作为非标记命令队列的
+ * 深度限制
+ *
+ * 是否阻塞:否
+ *
+ * 注意事项:LLD可以在任意时刻调用该函数。系统将自动执行“正确
+ * 的处理流程”;该函数支持在中断上下文中安全地调用
+ *
+ * 函数定义:drivers/scsi/scsi.c
+ **/
+ int scsi_track_queue_full(struct scsi_device *sdev, int depth)
+
+
+ /**
+ * scsi_unblock_requests - 恢复向指定主机适配器提交命令
+ *
+ * @shost: 指向要解除阻塞的主机适配器的指针
+ *
+ * 返回值:无
+ *
+ * 是否阻塞:否
+ *
+ * 函数定义:drivers/scsi/scsi_lib.c
+ **/
+ void scsi_unblock_requests(struct Scsi_Host * shost)
+
+
+
+接口函数
+========
+接口函数由底层驱动(LLD)定义实现,其函数指针保存在
+struct scsi_host_template实例中,并将该实例传递给
+scsi_host_alloc()。
+部分接口函数为必选实现项。所有
+接口函数都应声明为static,约定俗成的命名规则如下,
+驱动“xyz”应将其sdev_configure()函数声明为::
+
+ static int xyz_sdev_configure(struct scsi_device * sdev);
+
+其余接口函数的命名规范均依此类推。
+
+需将该函数指针赋值给“struct scsi_host_template”实例
+的‘sdev_configure’成员变量中,并将该结构体实例指针传
+递到中间层的scsi_host_alloc()函数。
+
+各个接口函数的详细说明可参考include/scsi/scsi_host.h
+文件,具体描述位于“struct scsi_host_template”结构体
+各个成员的上方。在某些情况下,scsi_host.h头文件中的描
+述比本文提供的更为详尽。
+
+以下按字母顺序列出所有接口函数及其说明。
+
+摘要:
+
+ - bios_param - 获取磁盘的磁头/扇区/柱面参数
+ - eh_timed_out - SCSI命令超时回调
+ - eh_abort_handler - 中止指定的SCSI命令
+ - eh_bus_reset_handler - 触发SCSI总线复位
+ - eh_device_reset_handler - 执行SCSI设备复位
+ - eh_host_reset_handler - 复位主机(主机总线适配器)
+ - info - 提供指定主机适配器的相关信息
+ - ioctl - 驱动可响应ioctl控制命令
+ - proc_info - 支持/proc/scsi/{驱动名}/{主机号}文件节点的读写操作
+ - queuecommand - 将SCSI命令提交到主机控制器,命令执行完成后调用‘done’回调
+ - sdev_init - 在向新设备发送SCSI命令前的初始化
+ - sdev_configure - 设备挂载后的精细化微调
+ - sdev_destroy - 设备即将被移除前的清理
+
+
+详细信息::
+
+ /**
+ * bios_param - 获取磁盘的磁头/扇区/柱面参数
+ * @sdev: 指向SCSI设备实例的指针(定义于
+ * include/scsi/scsi_device.h中)
+ * @bdev: 指向块设备实例的指针(定义于fs.h中)
+ * @capacity: 设备容量(以512字节扇区为单位)
+ * @params: 三元数组用于保存输出结果:
+ * params[0]:磁头数量(最大255)
+ * params[1]:扇区数量(最大63)
+ * params[2]:柱面数量
+ *
+ * 返回值:被忽略
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 进程上下文(sd)
+ *
+ * 注意事项: 若未提供此函数,系统将基于READ CAPACITY
+ * 使用默认几何参数。params数组已预初始化伪值,防止函
+ * 数无输出。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int bios_param(struct scsi_device * sdev, struct block_device *bdev,
+ sector_t capacity, int params[3])
+
+
+ /**
+ * eh_timed_out - SCSI命令超时回调
+ * @scp: 标识超时的命令
+ *
+ * 返回值:
+ *
+ * EH_HANDLED: 我已修复该错误,请继续完成该命令
+ * EH_RESET_TIMER: 我需要更多时间,请重置定时器并重新开始计时
+ * EH_NOT_HANDLED 开始正常的错误恢复流程
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 中断上下文
+ *
+ * 注意事项: 该回调函数为LLD提供一个机会进行本地
+ * 错误恢复处理。此处的恢复仅限于判断该未完成的命
+ * 令是否还有可能完成。此回调中不允许中止或重新启
+ * 动该命令。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int eh_timed_out(struct scsi_cmnd * scp)
+
+
+ /**
+ * eh_abort_handler - 中止指定的SCSI命令
+ * @scp: 标识要中止的命令
+ *
+ * 返回值:如果命令成功中止,则返回SUCCESS,否则返回FAILED
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 内核线程
+ *
+ * 注意事项: 该函数仅在命令超时时才被调用。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int eh_abort_handler(struct scsi_cmnd * scp)
+
+
+ /**
+ * eh_bus_reset_handler - 发起SCSI总线复位
+ * @scp: 包含该设备的SCSI总线应进行重置
+ *
+ * 返回值:重置成功返回SUCCESS;否则返回FAILED
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 内核线程
+ *
+ * 注意事项: 由SCSI错误处理线程(scsi_eh)调用。
+ * 在错误处理期间,当前主机适配器的所有IO请求均
+ * 被阻塞。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int eh_bus_reset_handler(struct scsi_cmnd * scp)
+
+
+ /**
+ * eh_device_reset_handler - 发起SCSI设备复位
+ * @scp: 指定将被重置的SCSI设备
+ *
+ * 返回值:如果命令成功中止返回SUCCESS,否则返回FAILED
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 内核线程
+ *
+ * 注意事项: 由SCSI错误处理线程(scsi_eh)调用。
+ * 在错误处理期间,当前主机适配器的所有IO请求均
+ * 被阻塞。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int eh_device_reset_handler(struct scsi_cmnd * scp)
+
+
+ /**
+ * eh_host_reset_handler - 复位主机(主机总线适配器)
+ * @scp: 管理该设备的SCSI主机适配器应该被重置
+ *
+ * 返回值:如果命令成功中止返回SUCCESS,否则返回FAILED
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 内核线程
+ *
+ * 注意事项: 由SCSI错误处理线程(scsi_eh)调用。
+ * 在错误处理期间,当前主机适配器的所有IO请求均
+ * 被阻塞。当使用默认的eh_strategy策略时,如果
+ * _abort_、_device_reset_、_bus_reset_和该处
+ * 理函数均未定义(或全部返回FAILED),系统强制
+ * 该故障设备处于离线状态
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int eh_host_reset_handler(struct scsi_cmnd * scp)
+
+
+ /**
+ * info - 提供给定主机适配器的详细信息:驱动程序名称
+ * 以及用于区分不同主机适配器的数据结构
+ * @shp: 指向目标主机的struct Scsi_Host实例
+ *
+ * 返回值:返回以NULL结尾的ASCII字符串。[驱动
+ * 负责管理返回的字符串所在内存并确保其在整个
+ * 主机适配器生命周期内有效。]
+ *
+ * 并发安全声明: 无锁
+ *
+ * 调用上下文说明: 进程上下文
+ *
+ * 注意事项: 通常提供诸如I/O地址或中断号
+ * 等PCI或ISA信息。如果未实现该函数,则
+ * 默认使用struct Scsi_Host::name 字段。
+ * 返回的字符串应为单行(即不包含换行符)。
+ * 通过SCSI_IOCTL_PROBE_HOST ioctl可获
+ * 取该函数返回的字符串,如果该函数不可用,
+ * 则ioctl返回struct Scsi_Host::name中
+ * 的字符串。
+
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ const char * info(struct Scsi_Host * shp)
+
+
+ /**
+ * ioctl - 驱动可响应ioctl控制命令
+ * @sdp: ioctl操作针对的SCSI设备
+ * @cmd: ioctl命令号
+ * @arg: 指向用户空间读写数据的指针。由于他指向用
+ * 户空间,必须使用适当的内核函数
+ * (如 copy_from_user())。按照Unix的风
+ * 格,该参数也可以视为unsigned long 类型。
+ *
+ * 返回值:如果出错则返回负的“errno”值。返回0或正值表
+ * 示成功,并将返回值传递给用户空间。
+ *
+ * 并发安全声明:无锁
+ *
+ * 调用上下文说明:进程上下文
+ *
+ * 注意事项:SCSI子系统使用“逐层下传
+ * (trickle down)”的ioctl模型。
+ * 用户层会对上层驱动设备节点
+ * (例如/dev/sdc)发起ioctl()调用,
+ * 如果上层驱动无法识别该命令,则将其
+ * 传递给SCSI中间层,若中间层也无法识
+ * 别,则再传递给控制该设备的LLD。
+ * 根据最新的Unix标准,对于不支持的
+ * ioctl()命令,应返回-ENOTTY。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int ioctl(struct scsi_device *sdp, int cmd, void *arg)
+
+
+ /**
+ * proc_info - 支持/proc/scsi/{驱动名}/{主机号}文件节点的读写操作
+ * @buffer: 输入或出的缓冲区锚点(writeto1_read0==0表示向buffer写
+ * 入,writeto1_read0==1表示由buffer读取)
+ * @start: 当writeto1_read0==0时,用于指定驱动实际填充的起始位置;
+ * 当writeto1_read0==1时被忽略。
+ * @offset: 当writeto1_read0==0时,表示用户关注的数据在缓冲区中的
+ * 偏移。当writeto1_read0==1时忽略。
+ * @length: 缓冲区的最大(或实际使用)长度
+ * @host_no: 目标SCSI Host的编号(struct Scsi_Host::host_no)
+ * @writeto1_read0: 1 -> 表示数据从用户空间写入驱动
+ * (例如,“echo some_string > /proc/scsi/xyz/2”)
+ * 0 -> 表示用户从驱动读取数据
+ * (例如,“cat /proc/scsi/xyz/2”)
+ *
+ * 返回值:当writeto1_read0==1时返回写入长度。否则,
+ * 返回从offset偏移开始输出到buffer的字符数。
+ *
+ * 并发安全声明:无锁
+ *
+ * 调用上下文说明:进程上下文
+ *
+ * 注意事项:该函数由scsi_proc.c驱动,与proc_fs交互。
+ * 当前SCSI子系统可移除对proc_fs的支持,相关配置选。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int proc_info(char * buffer, char ** start, off_t offset,
+ int length, int host_no, int writeto1_read0)
+
+
+ /**
+ * queuecommand - 将SCSI命令提交到主机控制器,命令执行完成后调用scp->scsi_done回调函数
+ * @shost: 指向目标SCSI主机控制器
+ * @scp: 指向待处理的SCSI命令
+ *
+ * 返回值:成功返回0。
+ *
+ * 如果发生错误,则返回:
+ *
+ * SCSI_MLQUEUE_DEVICE_BUSY表示设备队列满,
+ * SCSI_MLQUEUE_HOST_BUSY表示整个主机队列满
+ *
+ * 在这两种情况下,中间层将自动重新提交该I/O请求
+ *
+ * - 若返回SCSI_MLQUEUE_DEVICE_BUSY,则仅暂停该
+ * 特定设备的命令处理,当该设备的某个命令完成返回
+ * 时(或在短暂延迟后如果没有其他未完成命令)将恢
+ * 复其处理。其他设备的命令仍正常继续处理。
+ *
+ * - 若返回SCSI_MLQUEUE_HOST_BUSY,将暂停该主机
+ * 的所有I/O操作,当任意命令从该主机返回时(或在
+ * 短暂延迟后如果没有其他未完成命令)将恢复处理。
+ *
+ * 为了与早期的queuecommand兼容,任何其他返回值
+ * 都被视作SCSI_MLQUEUE_HOST_BUSY。
+ *
+ * 对于其他可立即检测到的错误,可通过以下流程处
+ * 理:设置scp->result为适当错误值,调用scp->scsi_done
+ * 回调函数,然后该函数返回0。若该命令未立即执行(LLD
+ * 正在启动或将要启动该命令),则应将scp->result置0并
+ * 返回0。
+ *
+ * 命令所有权说明:若驱动返回0,则表示驱动获得该命令的
+ * 所有权,
+ * 并必须确保最终执行scp->scsi_done回调函数。注意:驱动
+ * 可以在返回0之前调用scp->scsi_done,但一旦调用该回
+ * 调函数后,就只能返回0。若驱动返回非零值,则禁止在任何时
+ * 刻执行该命令的scsi_done回调函数。
+ *
+ * 并发安全声明:在2.6.36及更早的内核版本中,调用该函数时持有
+ * struct Scsi_Host::host_lock锁(通过“irqsave”获取中断安全的自旋锁),
+ * 并且返回时仍需保持该锁;从Linux 2.6.37开始,queuecommand
+ * 将在无锁状态下被调用。
+ *
+ * 调用上下文说明:在中断(软中断)或进程上下文中
+ *
+ * 注意事项:该函数执行应当非常快速,通常不会等待I/O
+ * 完成。因此scp->scsi_done回调函数通常会在该函数返
+ * 回后的某个时刻被调用(经常直接从中断服务例程中调用)。
+ * 某些情况下(如模拟SCSI INQUIRY响应的伪适配器驱动),
+ * scp->scsi_done回调可能在该函数返回前就被调用。
+ * 若scp->scsi_done回调函数未在指定时限内被调用,SCSI中
+ * 间层将启动错误处理流程。当调用scp->scsi_done回调函数
+ * 时,若“result”字段被设置为CHECK CONDITION,
+ * 则LLD应执行自动感知并填充
+ * struct scsi_cmnd::sense_buffer数组。在中间层将
+ * 命令加入LLD队列之前前,scsi_cmnd::sense_buffer数组
+ * 会被清零。
+ *
+ * 可选实现说明:LLD必须实现
+ **/
+ int queuecommand(struct Scsi_Host *shost, struct scsi_cmnd * scp)
+
+
+ /**
+ * sdev_init - 在向新设备发送任何SCSI命令前(即开始扫描
+ * 之前)调用该函数
+ * @sdp: 指向即将被扫描的新设备的指针
+ *
+ * 返回值:返回0表示正常。返回其他值表示出错,
+ * 该设备将被忽略。
+ *
+ * 并发安全声明:无锁
+ *
+ * 调用上下文说明:进程上下文
+ *
+ * 注意事项:该函数允许LLD在设备首次扫描前分配所需的资源。
+ * 对应的SCSI设备可能尚未真正存在,但SCSI中间层即将对其进
+ * 行扫描(例如发送INQUIRY命令等)。如果设备存在,将调用
+ * sdev_configure()进行配置;如果设备不存在,则调用
+ * sdev_destroy()销毁。更多细节请参考
+ * include/scsi/scsi_host.h文件。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int sdev_init(struct scsi_device *sdp)
+
+
+ /**
+ * sdev_configure - 在设备首次完成扫描(即已成功响应INQUIRY
+ * 命令)之后,LDD可调用该函数对设备进行进一步配置
+ * @sdp: 已连接的设备
+ *
+ * 返回值:返回0表示成功。任何其他返回值都被视为错误,此时
+ * 设备将被标记为离线。[被标记离线的设备不会调用sdev_destroy(),
+ * 因此需要LLD主动清理资源。]
+ *
+ * 并发安全声明:无锁
+ *
+ * 调用上下文说明:进程上下文
+ *
+ * 注意事项:该接口允许LLD查看设备扫描代码所发出的初始INQUIRY
+ * 命令的响应,并采取对应操作。具体实现细节请参阅
+ * include/scsi/scsi_host.h文件。
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ int sdev_configure(struct scsi_device *sdp)
+
+
+ /**
+ * sdev_destroy - 当指定设备即将被关闭时调用。此时该设备
+ * 上的所有I/O活动均已停止。
+ * @sdp: 即将关闭的设备
+ *
+ * 返回值:无
+ *
+ * 并发安全声明:无锁
+ *
+ * 调用上下文说明:进程上下文
+ *
+ * 注意事项:该设备的中间层数据结构仍然存在
+ * 但即将被销毁。驱动程序此时应当释放为该设
+ * 备分配的所有专属资源。系统将不再向此sdp
+ * 实例发送任何命令。[但该设备可能在未来被
+ * 重新连接,届时将通过新的struct scsi_device
+ * 实例,并触发后续的sdev_init()和
+ * sdev_configure()调用过程。]
+ *
+ * 可选实现说明:由LLD选择性定义
+ **/
+ void sdev_destroy(struct scsi_device *sdp)
+
+
+
+数据结构
+========
+struct scsi_host_template
+-------------------------
+每个LLD对应一个“struct scsi_host_template”
+实例 [#]_。该结构体通常被初始化为驱动头文件中的静
+态全局变量,此方式可确保未显式初始化的成员自动置零
+(0或NULL)。关键成员变量说明如下:
+
+ name
+ - 驱动程序的名称(可以包含空格,请限制在80个字符以内)
+
+ proc_name
+ - 在“/proc/scsi/<proc_name>/<host_no>”
+ 和sysfs的“drivers”目录中使用的名称。因此
+ “proc_name”应仅包含Unix文件名中可接受
+ 的字符。
+
+ ``(*queuecommand)()``
+ - 中间层使用的主要回调函数,用于将SCSI命令
+ 提交到LLD。
+
+ vendor_id
+ - 该字段是一个唯一标识值,用于确认提供
+ Scsi_Host LLD的供应商,最常用于
+ 验证供应商特定的消息请求。该值由标识符类型
+ 和供应商特定值组成,有效格式描述请参阅
+ scsi_netlink.h头文件。
+
+该结构体的完整定义及详细注释请参阅 ``include/scsi/scsi_host.h``。
+
+.. [#] 在极端情况下,单个驱动需要控制多种不同类型的硬件时,驱动可
+ 能包含多个实例,(例如某个LLD驱动同时处理ISA和PCI两种类型
+ 的适配卡,并为每种硬件类型维护独立的
+ struct scsi_host_template实例)。
+
+struct Scsi_Host
+----------------
+每个由LLD控制的主机适配器对应一个struct Scsi_Host实例。
+该结构体与struct scsi_host_template具有多个相同成员。
+当创建struct Scsi_Host实例时(通过hosts.c中的
+scsi_host_alloc()函数),这些通用成员会从LLD的
+struct scsi_host_template实例初始化而来。关键成员说明
+如下:
+
+ host_no
+ - 系统范围内唯一的主机标识号,按升序从0开始分配
+ can_queue
+ - 必须大于0,表示适配器可处理的最大并发命令数,禁
+ 止向适配器发送超过此数值的命令数
+ this_id
+ - 主机适配器的SCSI ID(SCSI启动器标识),若未知则
+ 设置为-1
+ sg_tablesize
+ - 主机适配器支持的最大散列表(scatter-gather)元素
+ 数。设置为SG_ALL或更小的值可避免使用链式SG列表,
+ 且最小值必须为1
+ max_sectors
+ - 单个SCSI命令中允许的最大扇区数(通常为512字节/
+ 扇区)。默认值为0,此时会使用
+ SCSI_DEFAULT_MAX_SECTORS(在scsi_host.h中定义),
+ 当前该值为1024。因此,如果未定义max_sectors,则磁盘的
+ 最大传输大小为512KB。注意:这个大小可能不足以支持
+ 磁盘固件上传。
+ cmd_per_lun
+ - 主机适配器的设备上,每个LUN可排队的最大命令数。
+ 此值可通过LLD调用scsi_change_queue_depth()进行
+ 调整。
+ hostt
+ - 指向LLD struct scsi_host_template实例的指针,
+ 当前struct Scsi_Host实例正是由此模板生成。
+ hostt->proc_name
+ - LLD的名称,sysfs使用的驱动名。
+ transportt
+ - 指向LLD struct scsi_transport_template实例的指
+ 针(如果存在)。当前支持FC与SPI传输协议。
+ hostdata[0]
+ - 为LLD在struct Scsi_Host结构体末尾预留的区域,大小由
+ scsi_host_alloc()的第二个参数(privsize)决定。
+
+scsi_host结构体的完整定义详见include/scsi/scsi_host.h。
+
+struct scsi_device
+------------------
+通常而言,每个SCSI逻辑单元(Logical Unit)对应一个该结构
+的实例。连接到主机适配器的SCSI设备通过三个要素唯一标识:通
+道号(Channel Number)、目标ID(Target ID)和逻辑单元号
+(LUN)。
+该结构体完整定义于include/scsi/scsi_device.h。
+
+struct scsi_cmnd
+----------------
+该结构体实例用于在LLD与SCSI中间层之间传递SCSI命令
+及其响应。SCSI中间层会确保:提交到LLD的命令数不超过
+scsi_change_queue_depth()(或struct Scsi_Host::cmd_per_lun)
+设定的上限,且每个SCSI设备至少分配一个struct scsi_cmnd实例。
+关键成员说明如下:
+
+ cmnd
+ - 包含SCSI命令的数组
+ cmd_len
+ - SCSI命令的长度(字节为单位)
+ sc_data_direction
+ - 数据的传输方向。请参考
+ include/linux/dma-mapping.h中的
+ “enum dma_data_direction”。
+ result
+ - LLD在调用“done”之前设置该值。值为0表示命令成功
+ 完成(并且所有数据(如果有)已成功在主机与SCSI
+ 目标设备之间完成传输)。“result”是一个32位无符
+ 号整数,可以视为2个相关字节。SCSI状态值位于最
+ 低有效位。请参考include/scsi/scsi.h中的
+ status_byte()与host_byte()宏以及其相关常量。
+ sense_buffer
+ - 这是一个数组(最大长度为SCSI_SENSE_BUFFERSIZE
+ 字节),当SCSI状态(“result”的最低有效位)设为
+ CHECK_CONDITION(2)时,该数组由LLD填写。若
+ CHECK_CONDITION被置位,且sense_buffer[0]的高
+ 半字节值为7,则中间层会认为sense_buffer数组
+ 包含有效的SCSI感知数据;否则,中间层会发送
+ REQUEST_SENSE SCSI命令来获取感知数据。由于命令
+ 排队的存在,后一种方式容易出错,因此建议LLD始终
+ 支持“自动感知”。
+ device
+ - 指向与该命令关联的scsi_device对象的指针。
+ resid_len (通过调用scsi_set_resid() / scsi_get_resid()访问)
+ - LLD应将此无符号整数设置为请求的传输长度(即
+ “request_bufflen”)减去实际传输的字节数。“resid_len”
+ 默认设置为0,因此如果LLD无法检测到数据欠载(不能报告溢出),
+ 则可以忽略它。LLD应在调用“done”之前设置
+ “resid_len”。
+ underflow
+ - 如果实际传输的字节数小于该字段值,LLD应将
+ DID_ERROR << 16赋值给“result”。并非所有
+ LLD都实现此项检查,部分LLD仅将错误信息输出
+ 到日志,而未真正报告DID_ERROR。更推荐
+ 的做法是由LLD实现“resid_len”的支持。
+
+建议LLD在从SCSI目标设备进行数据传输时设置“resid_len”字段
+(例如READ操作)。当这些数据传输的感知码是MEDIUM ERROR或
+HARDWARE ERROR(有时也包括RECOVERED ERROR)时设置
+resid_len尤为重要。在这种情况下,如果LLD无法确定接收了多
+少数据,那么最安全的做法是表示没有接收到任何数据。例如:
+为了表明没有接收到任何有效数据,LLD可以使用如下辅助函数::
+
+ scsi_set_resid(SCpnt, scsi_bufflen(SCpnt));
+
+其中SCpnt是一个指向scsi_cmnd对象的指针。如果表示仅接收到
+三个512字节的数据块,可以这样设置resid_len::
+
+ scsi_set_resid(SCpnt, scsi_bufflen(SCpnt) - (3 * 512));
+
+scsi_cmnd结构体定义在 include/scsi/scsi_cmnd.h文件中。
+
+
+锁
+===
+每个struct Scsi_Host实例都有一个名为default_lock
+的自旋锁(spin_lock),它在scsi_host_alloc()函数
+中初始化(该函数定义在hosts.c文件中)。在同一个函数
+中,struct Scsi_Host::host_lock指针会被初始化为指
+向default_lock。此后,SCSI中间层执行的加
+锁和解锁操作都会使用host_lock指针。过去,驱动程序可
+以重写host_lock指针,但现在不再允许这样做。
+
+
+自动感知
+========
+自动感知(Autosense或auto-sense)在SAM-2规范中被定
+义为:当SCSI命令完成状态为CHECK CONDITION时,“自动
+将感知数据(sense data)返回给应用程序客户端”。底层
+驱动(LLD)应当执行自动感知。当LLD检测到
+CHECK CONDITION状态时,可通过以下任一方式完成:
+
+ a) 要求SCSI协议(例如SCSI并行接口(SPI))在此
+ 类响应中执行一次额外的数据传输
+
+ b) 或由LLD主动发起REQUEST SENSE命令获取感知数据
+
+无论采用哪种方式,当检测到CHECK CONDITION状态时,中
+间层通过检查结构体scsi_cmnd::sense_buffer[0]的值来
+判断LLD是否已执行自动感知。若该字节的高半字节为7
+(或 0xf),则认为已执行自动感知;若该字节为其他值
+(且此字节在每条命令执行前会被初始化为0),则中间层将
+主动发起REQUEST SENSE命令。
+
+在存在命令队列的场景下,保存失败命令感知数据的“nexus”
+可能会在等待REQUEST SENSE命令期间变得不同步。因此,
+最佳实践是由LLD执行自动感知。
+
+
+自Linux内核2.4以来的变更
+========================
+io_request_lock已被多个细粒度锁替代。与底层驱动
+(LLD)相关的锁是struct Scsi_Host::host_lock,且每
+个SCSI主机都独立拥有一个该锁。
+
+旧的错误处理机制已经被移除。这意味着LLD的接口函数
+abort()与reset()已经被删除。
+struct scsi_host_template::use_new_eh_code标志
+也已经被移除。
+
+在Linux内核2.4中,SCSI子系统的配置描述与其他Linux子系
+统的配置描述集中存放在Documentation/Configure.help
+文件中。在Linux内核2.6中,SCSI子系统拥有独立的配置文
+件drivers/scsi/Kconfig(体积更小),同时包含配置信息
+与帮助信息。
+
+struct SHT已重命名为struct scsi_host_template。
+
+新增“热插拔初始化模型”以及许多用于支持该功能的额外函数。
+
+
+致谢
+====
+以下人员对本文档做出了贡献:
+
+ - Mike Anderson <andmike at us dot ibm dot com>
+ - James Bottomley <James dot Bottomley at hansenpartnership dot com>
+ - Patrick Mansfield <patmans at us dot ibm dot com>
+ - Christoph Hellwig <hch at infradead dot org>
+ - Doug Ledford <dledford at redhat dot com>
+ - Andries Brouwer <Andries dot Brouwer at cwi dot nl>
+ - Randy Dunlap <rdunlap at xenotime dot net>
+ - Alan Stern <stern at rowland dot harvard dot edu>
+
+
+Douglas Gilbert
+dgilbert at interlog dot com
+
+2004年9月21日
diff --git a/Documentation/translations/zh_CN/scsi/sd-parameters.rst b/Documentation/translations/zh_CN/scsi/sd-parameters.rst
new file mode 100644
index 000000000000..b3d0445dc9f3
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/sd-parameters.rst
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/sd-parameters.rst
+
+:翻译:
+
+ 郝栋栋 doubled <doubled@leap-io-kernel.com>
+
+:校译:
+
+
+
+============================
+Linux SCSI磁盘驱动(sd)参数
+============================
+
+缓存类型(读/写)
+------------------
+启用/禁用驱动器读写缓存。
+
+=========================== ===== ===== ======= =======
+ 缓存类型字符串 WCE RCD 写缓存 读缓存
+=========================== ===== ===== ======= =======
+ write through 0 0 关闭 开启
+ none 0 1 关闭 关闭
+ write back 1 0 开启 开启
+ write back, no read (daft) 1 1 开启 关闭
+=========================== ===== ===== ======= =======
+
+将缓存类型设置为“write back”并将该设置保存到驱动器::
+
+ # echo "write back" > cache_type
+
+如果要修改缓存模式但不使更改持久化,可在缓存类型字符串前
+添加“temporary ”。例如::
+
+ # echo "temporary write back" > cache_type
diff --git a/Documentation/translations/zh_CN/scsi/wd719x.rst b/Documentation/translations/zh_CN/scsi/wd719x.rst
new file mode 100644
index 000000000000..79757c42032b
--- /dev/null
+++ b/Documentation/translations/zh_CN/scsi/wd719x.rst
@@ -0,0 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/scsi/libsas.rst
+
+:翻译:
+
+ 张钰杰 Yujie Zhang <yjzhang@leap-io-kernel.com>
+
+:校译:
+
+====================================================
+Western Digital WD7193, WD7197 和 WD7296 SCSI 卡驱动
+====================================================
+
+这些卡需要加载固件。固件可从 WD 提供下载的 Windows NT 驱动程
+序中提取。地址如下:
+
+http://support.wdc.com/product/download.asp?groupid=801&sid=27&lang=en
+
+该文件或网页上都未包含任何许可声明,因此该固件可能无法被收录到
+linux-firmware 项目中。
+
+提供的脚本可用于下载并提取固件,生成 wd719x-risc.bin 和
+wd719x-wcs.bin 文件。请将它们放置在 /lib/firmware/ 目录下。
+脚本内容如下:
+
+ #!/bin/sh
+ wget http://support.wdc.com/download/archive/pciscsi.exe
+ lha xi pciscsi.exe pci-scsi.exe
+ lha xi pci-scsi.exe nt/wd7296a.sys
+ rm pci-scsi.exe
+ dd if=wd7296a.sys of=wd719x-risc.bin bs=1 skip=5760 count=14336
+ dd if=wd7296a.sys of=wd719x-wcs.bin bs=1 skip=20096 count=514
+ rm wd7296a.sys
diff --git a/Documentation/translations/zh_CN/security/SCTP.rst b/Documentation/translations/zh_CN/security/SCTP.rst
new file mode 100644
index 000000000000..f2774b0d66b5
--- /dev/null
+++ b/Documentation/translations/zh_CN/security/SCTP.rst
@@ -0,0 +1,317 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/security/SCTP.rst
+
+:翻译:
+ 赵硕 Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
+
+====
+SCTP
+====
+
+SCTP的LSM支持
+=============
+
+安全钩子
+--------
+
+对于安全模块支持,已经实现了三个特定于SCTP的钩子::
+
+ security_sctp_assoc_request()
+ security_sctp_bind_connect()
+ security_sctp_sk_clone()
+ security_sctp_assoc_established()
+
+这些钩子的用法在下面的 `SCTP的SELinux支持`_ 一章中描述SELinux的实现。
+
+
+security_sctp_assoc_request()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+将关联INIT数据包的 ``@asoc`` 和 ``@chunk->skb`` 传递给安全模块。
+成功时返回 0,失败时返回错误。
+::
+
+ @asoc - 指向sctp关联结构的指针。
+ @skb - 指向包含关联数据包skbuff的指针。
+
+
+security_sctp_bind_connect()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+将一个或多个IPv4/IPv6地址传递给安全模块进行基于 ``@optname`` 的验证,
+这将导致是绑定还是连接服务,如下面的权限检查表所示。成功时返回 0,失败
+时返回错误。
+::
+
+ @sk - 指向sock结构的指针。
+ @optname - 需要验证的选项名称。
+ @address - 一个或多个IPv4 / IPv6地址。
+ @addrlen - 地址的总长度。使用sizeof(struct sockaddr_in)或
+ sizeof(struct sockaddr_in6)来计算每个ipv4或ipv6地址。
+
+ ------------------------------------------------------------------
+ | BIND 类型检查 |
+ | @optname | @address contains |
+ |----------------------------|-----------------------------------|
+ | SCTP_SOCKOPT_BINDX_ADD | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_PRIMARY_ADDR | 单个 ipv4 or ipv6 地址 |
+ | SCTP_SET_PEER_PRIMARY_ADDR | 单个 ipv4 or ipv6 地址 |
+ ------------------------------------------------------------------
+
+ ------------------------------------------------------------------
+ | CONNECT 类型检查 |
+ | @optname | @address contains |
+ |----------------------------|-----------------------------------|
+ | SCTP_SOCKOPT_CONNECTX | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_PARAM_ADD_IP | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_SENDMSG_CONNECT | 单个 ipv4 or ipv6 地址 |
+ | SCTP_PARAM_SET_PRIMARY | 单个 ipv4 or ipv6 地址 |
+ ------------------------------------------------------------------
+
+条目 ``@optname`` 的摘要如下::
+
+ SCTP_SOCKOPT_BINDX_ADD - 允许在(可选地)调用 bind(3) 后,关联额外
+ 的绑定地址。
+ sctp_bindx(3) 用于在套接字上添加一组绑定地址。
+
+ SCTP_SOCKOPT_CONNECTX - 允许分配多个地址以连接到对端(多宿主)。
+ sctp_connectx(3) 使用多个目标地址在SCTP
+ 套接字上发起连接。
+
+ SCTP_SENDMSG_CONNECT - 通过sendmsg(2)或sctp_sendmsg(3)在新关联上
+ 发起连接。
+
+ SCTP_PRIMARY_ADDR - 设置本地主地址。
+
+ SCTP_SET_PEER_PRIMARY_ADDR - 请求远程对端将某个地址设置为其主地址。
+
+ SCTP_PARAM_ADD_IP - 在启用动态地址重配置时使用。
+ SCTP_PARAM_SET_PRIMARY - 如下所述,启用重新配置功能。
+
+
+为了支持动态地址重新配置,必须在两个端点上启用以下
+参数(或使用适当的 **setsockopt**\(2))::
+
+ /proc/sys/net/sctp/addip_enable
+ /proc/sys/net/sctp/addip_noauth_enable
+
+当相应的 ``@optname`` 存在时,以下的 *_PARAM_* 参数会
+通过ASCONF块发送到对端::
+
+ @optname ASCONF Parameter
+ ---------- ------------------
+ SCTP_SOCKOPT_BINDX_ADD -> SCTP_PARAM_ADD_IP
+ SCTP_SET_PEER_PRIMARY_ADDR -> SCTP_PARAM_SET_PRIMARY
+
+
+security_sctp_sk_clone()
+~~~~~~~~~~~~~~~~~~~~~~~~
+每当通过 **accept**\(2)创建一个新的套接字(即TCP类型的套接字),或者当
+一个套接字被‘剥离’时如用户空间调用 **sctp_peeloff**\(3),会调用此函数。
+::
+
+ @asoc - 指向当前sctp关联结构的指针。
+ @sk - 指向当前套接字结构的指针。
+ @newsk - 指向新的套接字结构的指针。
+
+
+security_sctp_assoc_established()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+当收到COOKIE ACK时调用,对于客户端,对端的secid将被保存
+到 ``@asoc->peer_secid`` 中::
+
+ @asoc - 指向sctp关联结构的指针。
+ @skb - 指向COOKIE ACK数据包的skbuff指针。
+
+
+用于关联建立的安全钩子
+----------------------
+
+下图展示了在建立关联时 ``security_sctp_bind_connect()``、 ``security_sctp_assoc_request()``
+和 ``security_sctp_assoc_established()`` 的使用。
+::
+
+ SCTP 端点 "A" SCTP 端点 "Z"
+ ============= =============
+ sctp_sf_do_prm_asoc()
+ 关联的设置可以通过connect(2),
+ sctp_connectx(3),sendmsg(2)
+ or sctp_sendmsg(3)来发起。
+ 这将导致调用security_sctp_bind_connect()
+ 发起与SCTP对端端点"Z"的关联。
+ INIT --------------------------------------------->
+ sctp_sf_do_5_1B_init()
+ 响应一个INIT数据块。
+ SCTP对端端点"A"正在请求一个临时关联。
+ 如果是首次关联,调用security_sctp_assoc_request()
+ 来设置对等方标签。
+ 如果不是首次关联,检查是否被允许。
+ 如果允许,则发送:
+ <----------------------------------------------- INIT ACK
+ |
+ | 否则,生成审计事件并默默丢弃该数据包。
+ |
+ COOKIE ECHO ------------------------------------------>
+ sctp_sf_do_5_1D_ce()
+ 响应一个COOKIE ECHO数据块。
+ 确认该cookie并创建一个永久关联。
+ 调用security_sctp_assoc_request()
+ 执行与INIT数据块响应相同的操作。
+ <------------------------------------------- COOKIE ACK
+ | |
+ sctp_sf_do_5_1E_ca |
+ 调用security_sctp_assoc_established() |
+ 来设置对方标签 |
+ | |
+ | 如果是SCTP_SOCKET_TCP或是剥离的套接
+ | 字,会调用 security_sctp_sk_clone()
+ | 来克隆新的套接字。
+ | |
+ 建立 建立
+ | |
+ ------------------------------------------------------------------
+ | 关联建立 |
+ ------------------------------------------------------------------
+
+
+SCTP的SELinux支持
+=================
+
+安全钩子
+--------
+
+上面的 `SCTP的LSM支持`_ 章节描述了以下SCTP安全钩子,SELinux的细节
+说明如下::
+
+ security_sctp_assoc_request()
+ security_sctp_bind_connect()
+ security_sctp_sk_clone()
+ security_sctp_assoc_established()
+
+
+security_sctp_assoc_request()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+将关联INIT数据包的 ``@asoc`` 和 ``@chunk->skb`` 传递给安全模块。
+成功时返回 0,失败时返回错误。
+::
+
+ @asoc - 指向sctp关联结构的指针。
+ @skb - 指向关联数据包skbuff的指针。
+
+安全模块执行以下操作:
+ 如果这是 ``@asoc->base.sk`` 上的首次关联,则将对端的sid设置
+ 为 ``@skb`` 中的值。这将确保只有一个对端sid分配给可能支持多个
+ 关联的 ``@asoc->base.sk``。
+
+ 否则验证 ``@asoc->base.sk peer sid`` 是否与 ``@skb peer sid``
+ 匹配,以确定该关联是否应被允许或拒绝。
+
+ 将sctp的 ``@asoc sid`` 设置为套接字的sid(来自 ``asoc->base.sk``)
+ 并从 ``@skb peer sid`` 中提取MLS部分。这将在SCTP的TCP类型套接字及
+ 剥离连接中使用,因为它们会导致生成一个新的套接字。
+
+ 如果配置了IP安全选项(CIPSO/CALIPSO),则会在套接字上设置IP选项。
+
+
+security_sctp_bind_connect()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+根据 ``@optname`` 检查ipv4/ipv6地址所需的权限,具体如下::
+
+ ------------------------------------------------------------------
+ | BIND 权限检查 |
+ | @optname | @address contains |
+ |----------------------------|-----------------------------------|
+ | SCTP_SOCKOPT_BINDX_ADD | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_PRIMARY_ADDR | 单个 ipv4 or ipv6 地址 |
+ | SCTP_SET_PEER_PRIMARY_ADDR | 单个 ipv4 or ipv6 地址 |
+ ------------------------------------------------------------------
+
+ ------------------------------------------------------------------
+ | CONNECT 权限检查 |
+ | @optname | @address contains |
+ |----------------------------|-----------------------------------|
+ | SCTP_SOCKOPT_CONNECTX | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_PARAM_ADD_IP | 一个或多个 ipv4 / ipv6 地址 |
+ | SCTP_SENDMSG_CONNECT | 单个 ipv4 or ipv6 地址 |
+ | SCTP_PARAM_SET_PRIMARY | 单个 ipv4 or ipv6 地址 |
+ ------------------------------------------------------------------
+
+
+`SCTP的LSM支持`_ 提供了 ``@optname`` 摘要,并且还描述了当启用动态地址重新
+配置时,ASCONF块的处理过程。
+
+
+security_sctp_sk_clone()
+~~~~~~~~~~~~~~~~~~~~~~~~
+每当通过 **accept**\(2)(即TCP类型的套接字)创建一个新的套接字,或者
+当一个套接字被“剥离”如用户空间调用 **sctp_peeloff**\(3)时,
+``security_sctp_sk_clone()`` 将会分别将新套接字的sid和对端sid设置为
+``@asoc sid`` 和 ``@asoc peer sid`` 中包含的值。
+::
+
+ @asoc - 指向当前sctp关联结构的指针。
+ @sk - 指向当前sock结构的指针。
+ @newsk - 指向新sock结构的指针。
+
+
+security_sctp_assoc_established()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+当接收到COOKIE ACK时调用,它将连接的对端sid设置为 ``@skb`` 中的值::
+
+ @asoc - 指向sctp关联结构的指针。
+ @skb - 指向COOKIE ACK包skbuff的指针。
+
+
+策略声明
+--------
+以下支持SCTP的类和权限在内核中是可用的::
+
+ class sctp_socket inherits socket { node_bind }
+
+当启用以下策略功能时::
+
+ policycap extended_socket_class;
+
+SELinux对SCTP的支持添加了用于连接特定端口类型 ``name_connect`` 权限
+以及在下面的章节中进行解释的 ``association`` 权限。
+
+如果用户空间工具已更新,SCTP将支持如下所示的 ``portcon`` 声明::
+
+ portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0
+
+
+SCTP对端标签
+------------
+每个SCTP套接字仅分配一个对端标签。这个标签将在建立第一个关联时分配。
+任何后续在该套接字上的关联都会将它们的数据包对端标签与套接字的对端标
+签进行比较,只有在它们不同的情况下 ``association`` 权限才会被验证。
+这是通过检查套接字的对端sid与接收到的数据包中的对端sid来验证的,以决
+定是否允许或拒绝该关联。
+
+注:
+ 1) 如果对端标签未启用,则对端上下文将始终是 ``SECINITSID_UNLABELED``
+ (在策略声明中为 ``unlabeled_t`` )。
+
+ 2) 由于SCTP可以在单个套接字上支持每个端点(多宿主)的多个传输地址,因此
+ 可以配置策略和NetLabel为每个端点提供不同的对端标签。由于套接字的对端
+ 标签是由第一个关联的传输地址决定的,因此建议所有的对端标签保持一致。
+
+ 3) 用户空间可以使用 **getpeercon**\(3) 来检索套接字的对端上下文。
+
+ 4) 虽然这不是SCTP特有的,但在使用NetLabel时要注意,如果标签分配给特定的接
+ 口,而该接口‘goes down’,则NetLabel服务会移除该条目。因此,请确保网络启
+ 动脚本调用 **netlabelctl**\(8) 来设置所需的标签(详细信息,
+ 请参阅 **netlabel-config**\(8) 辅助脚本)。
+
+ 5) NetLabel SCTP对端标签规则应用如下所述标签为“netlabel”的一组帖子:
+ https://www.paul-moore.com/blog/t.
+
+ 6) CIPSO仅支持IPv4地址: ``socket(AF_INET, ...)``
+ CALIPSO仅支持IPv6地址: ``socket(AF_INET6, ...)``
+
+ 测试CIPSO/CALIPSO时请注意以下事项:
+ a) 如果SCTP数据包由于无效标签无法送达,CIPSO会发送一个ICMP包。
+ b) CALIPSO不会发送ICMP包,只会默默丢弃数据包。
+
+ 7) RFC 3554不支持IPSEC —— SCTP/IPSEC支持尚未在用户空间实现(**racoon**\(8)
+ 或 **ipsec_pluto**\(8)),尽管内核支持 SCTP/IPSEC。
diff --git a/Documentation/translations/zh_CN/security/index.rst b/Documentation/translations/zh_CN/security/index.rst
index 78d9d4b36dca..d33b107405c7 100644
--- a/Documentation/translations/zh_CN/security/index.rst
+++ b/Documentation/translations/zh_CN/security/index.rst
@@ -18,7 +18,9 @@
credentials
snp-tdx-threat-model
lsm
+ lsm-development
sak
+ SCTP
self-protection
siphash
tpm/index
@@ -28,7 +30,5 @@
TODOLIST:
* IMA-templates
* keys/index
-* lsm-development
-* SCTP
* secrets/index
* ipe
diff --git a/Documentation/translations/zh_CN/security/ipe.rst b/Documentation/translations/zh_CN/security/ipe.rst
new file mode 100644
index 000000000000..55968f0c7ae3
--- /dev/null
+++ b/Documentation/translations/zh_CN/security/ipe.rst
@@ -0,0 +1,398 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/security/sak.rst
+
+:翻译:
+ 赵硕 Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
+
+完整性策略执行(IPE)-内核文档
+==============================
+
+.. NOTE::
+
+ 这是针对开发人员而不是管理员的文档。如果您正在
+ 寻找有关IPE使用的文档,请参阅 :doc:`IPE admin
+ guide </admin-guide/LSM/ipe>`。
+
+历史背景
+--------
+
+最初促使IPE实施的原因,是需要创建一个锁定式系统。该系统将
+从一开始就具备安全性,并且在可执行代码和系统功能关键的特定
+数据文件上,提供强有力的完整性保障。只有当这些特定数据文件
+符合完整性策略时,它们才可以被读取。系统中还将存在强制访问
+控制机制,因此扩展属性(xattrs)也必须受到保护。这就引出了
+需要选择能够提供完整性保证的机制。当时,有两种主要机制被考
+虑,用以在满足这些要求的前提下保证系统完整性:
+
+ 1. IMA + EVM Signatures
+ 2. DM-Verity
+
+这两个选项都经过了仔细考虑,然而在原始的IPE使用场景
+中,最终选择DM-Verity而非IMA+EVM作为完整性机制,主
+要有三个原因:
+
+ 1. 防护额外的攻击途径
+
+ * 使用IMA+EVM时,如果没有加密解决方案,系统很容易受到
+ 离线攻击,特别是针对上述特定数据文件的攻击。
+
+ 与可执行文件不同,读取操作(如对受保护数据文件的读
+ 取操作)无法强制性进行全局完整性验证。这意味着必须
+ 有一种选择机制来决定是否应对某个读取操作实施完整性
+ 策略。
+
+ 在当时,这是通过强制访问控制标签来实现的,IMA策略会
+ 指定哪些标签需要进行完整性验证,这带来了一个问题:
+ EVM虽然可以保护标签,但如果攻击者离线修改文件系统,
+ 那么攻击者就可以清除所有的扩展属性(xattrs)——包括
+ 用于确定文件是否应受完整性策略约束的SELinux标签。
+
+ 使用DM-Verity,由于xattrs被保存为Merkel树的一部分,
+ 如果对由dm-verity保护的文件系统进行了离线挂载,校验
+ 和将不在匹配,文件将无法读取。
+
+ * 由于用户空间的二进制文件在Linux中是分页加载的,dm-
+ verity同样提供了对抗恶意块设备的额外保护。在这样的
+ 攻击中,块设备最初报告适当的内容以供IMA哈希计算,通
+ 过所需的完整性检查。然后,在访问真实数据时发生的页面
+ 错误将报告攻击者的有效载荷。由于dm-verity会在页面错
+ 误发生时检查数据(以及磁盘访问),因此这种攻击得到了
+ 缓解。
+
+ 2. 性能:
+
+ * dm-verity在块被读取时按需提供完整性验证,而不需要将整
+ 个文件读入内存进行验证。
+
+ 3. 签名的简化性:
+
+ * 不需要两个签名(IMA 然后是 EVM):一个签名可以覆盖整个
+ 块设备。
+ * 签名可以存储在文件系统元数据之外。
+ * 该签名支持基于 x.509 的签名基础设施。
+
+下一步是选择一个策略来执行完整性验证机制,该策略的最低
+要求是:
+
+ 1. 策略本身必须经过完整性验证(防止针对它的简单攻击)。
+ 2. 策略本身必须抵抗回滚攻击。
+ 3. 策略执行必须具有类似宽松模式的功能。
+ 4. 策略必须能够在不重启的情况下,完整地进行更新。
+ 5. 策略更新必须是原子性的。
+ 6. 策略必须支持撤销先前创建的组件。
+ 7. 策略必须在任何时间点都能进行审计。
+
+当时,IMA作为唯一的完整性策略机制,被用来与这些要求进行对比,
+但未能满足所有最低要求。尽管考虑过扩展IMA以涵盖这些要求,但
+最终因两个原因被放弃:
+
+ 1. 回归风险;这其中许多变更将导致对已经存在于内核的IMA进行
+ 重大代码更改,因此可能会影响用户。
+
+ 2. IMA在该系统中用于测量和证明;将测量策略与本地完整性策略
+ 的执行分离被认为是有利的。
+
+由于这些原因,决定创建一个新的LSM,其职责是仅限于本地完整性
+策略的执行。
+
+职责和范围
+----------
+
+IPE顾名思义,本质上是一种完整性策略执行解决方案;IPE并不强制规定
+如何提供完整性保障,而是将这一决策权留给系统管理员,管理员根据自身
+需求,选择符合的机制来设定安全标准。存在几种不同的完整性解决方案,
+它们提供了不同程度的安全保障;而IPE允许系统管理员理论上为所有这些
+解决方案制定策略。
+
+IPE自身没有内置确保完整性的固有机制。相反,在构建具备完整性保障能力
+的系统时,存在更高效的分层方案可供使用。需要重点注意的是,用于证明完
+整性的机制,与用于执行完整性声明的策略是相互独立的。
+
+因此,IPE依据以下方面进行设计:
+
+ 1. 便于与完整性提供机制集成。
+ 2. 便于平台管理员/系统管理员使用。
+
+设计理由:
+---------
+
+IPE是在评估其他操作系统和环境中的现有完整性策略解决方案后设计的。
+在对其他实现的调查中,发现了一些缺陷:
+
+ 1. 策略不易为人们读取,通常需要二进制中间格式。
+ 2. 默认情况下会隐式采取单一的、不可定制的操作。
+ 3. 调试策略需要手动来确定违反了哪个规则。
+ 4. 编写策略需要对更大系统或操作系统有深入的了解。
+
+IPE尝试避免所有这些缺陷。
+
+策略
+~~~~
+
+纯文本
+^^^^^^
+
+IPE的策略是纯文本格式的。相较于其他Linux安全模块(LSM),
+策略文件体积略大,但能解决其他平台上部分完整性策略方案存在
+的两个核心问题。
+
+第一个问题是代码维护和冗余的问题。为了编写策略,策略必须是
+以某种形式的字符串形式呈现(无论是 XML、JSON、YAML 等结构化
+格式,还是其他形式),以便策略编写者能够理解所写内容。在假设
+的二进制策略设计中,需要一个序列化器将策略将可读的形式转换为
+二进制形式,同时还需要一个反序列化器来将二进制形式转换为内核
+中的数据结构。
+
+最终,还需要另一个反序列化器将是必要的,用于将二进制形式转换
+为人类可读的形式,并尽可能保存所有信息,这是因为使用此访问控
+制系统的用户必须维护一个校验表和原始文件,才能理解哪些策略已
+经部署在该系统上,哪些没有。对于单个用户来说,这可能没问题,
+因为旧的策略可以在更新生效后很快被丢弃。但对于管理成千上万、
+甚至数十万台计算机的用户,且这些计算机有不同的操作系统和不同
+的操作需求,这很快就成了一个问题,因为数年前的过时策略可能仍然
+存在,从而导致需要快速恢复策略或投资大量基础设施来跟踪每个策略
+的内容。
+
+有了这三个独立的序列化器/反序列化器,维护成本非常昂贵。如果策略
+避免使用二进制格式,则只需要一个序列化器;将人类可读的形式转换
+为内核中的数据结构。从而节省了代码维护成本,并保持了可操作性。
+
+第二个关于二进制格式的问题是透明性,由于IPE根据系统资源的可信度
+来控制访问,因此其策略也必须可信,以便可以被更改。这是通过签名来
+完成的,这就需要签名过程。签名过程通常具有很高的安全标准,因为
+任何被签名的内容都可以被用来攻击完整性执行系统。签署时,签署者
+必须知道他们在签署什么,二进制策略可能会导致这一点的模糊化;签署
+者看到的只是一个不透明的二进制数据块。另一方面,对于纯文本策略中,
+签署者看到的则是实际提交的策略。
+
+启动策略
+~~~~~~~~
+
+如果配置得当,IPE能够在内核启动并进入用户模式时立即执行策略。
+这意味着需要在用户模式开始的那一刻就存储一定的策略。通常,这种
+存储可以通过一下三种方式之一来处理:
+
+ 1. 策略文件存储在磁盘上,内核在进入可能需要做出执行决策的代码
+ 路径之前,先加载该策略。
+ 2. 策略文件由引导加载程序传递给内核,内核解析这些策略。
+ 3. 将一个策略文件编译到内核中,内核在初始化过程中对其进行解析并
+ 执行。
+
+第一种方式存在问题:内核从用户空间读取文件通常是不推荐的,并且在
+内核中极为罕见。
+
+第二种选项同样存在问题:Linux在其整个生态系统中支持多种引导加载程序,
+所有引导加载程序都必须支持这种新方法,或者需要有一个独立的来源,这
+可能会导致内核启动过程发生不必要的重大变化。
+
+第三种选项是最佳选择,但需要注意的是,编译进内核的策略会占用磁盘空间。
+重要的是要使这一策略足够通用,以便用户空间能够加载新的、更复杂的策略,
+同时也要足够严格,以防止过度授权并避免引发安全问题。
+
+initramfs提供了一种建立此启动路径的方法。内核启动时以最小化的策略启动,
+该策略仅信任initramfs。在initramfs内,当真实的根文件系统已挂载且尚未
+切换时,它会部署并激活一个信任新根文件系统的策略。这种方法防止了在任何
+步骤中出现过度授权,并保持内核策略的最小化。
+
+启动
+^^^^
+
+然而,并不是每个系统都以initramfs启动,因此编译进内核的启动策略需要具备
+一定的灵活性,以明确如何为启动的下一个阶段建立信任。为此,如果我们将编译
+进内核的策略设计为一个完整的IPE策略,这样系统构建者便能合理定义第一阶段启
+动的需求。
+
+可更新、无需重启的策略
+~~~~~~~~~~~~~~~~~~~~~~
+
+随着时间的推移,系统需求发生变化(例如,之前信任的应用程序中发现漏洞、秘钥
+轮换等)。更新内核以满足这些安全目标并非始终是一个合适的选择,因为内核更新并
+非完全无风险的,而搁置安全更新会使系统处于脆弱状态。这意味着IPE需要一个可以
+完全更新的策略(允许撤销现有的策略),并且这个更新来源必须是内核外部的(允许
+再不更新内核的情况下更新策略)。
+
+此外,由于内核在调用之间是无状态的,并且从内核空间读取磁盘上的策略文件不是一
+个好主意,因此策略更新必须能够在不重启的情况下完成。
+
+为了允许从外部来源进行更新,考虑到外部来源可能是恶意的,因此该策略需要具备可被
+识别为可信的机制。这一机制通过签名链实现:策略的签名需与内核中的某个信任源相
+关联。通常,这个信任源是 ``SYSTEM_TRUSTED_KEYRING`` ,这是一个在内核编译时就被
+初始化填充的密钥环,因为这符合上述编译进来策略的制作者与能够部署策略更新的实体
+相同的预期。
+
+防回滚 / 防重放
+~~~~~~~~~~~~~~~
+
+随着时间的推移,系统可能会发现漏洞,曾经受信任的资源可能不再可信,IPE的
+策略也不例外。可能会出现的情况是,策略制作者误部署了一个不安全的策略,
+随后再用一个安全的策略进行修正。
+
+假设一旦不安全的策略被部署,攻击者获取了这个不安全的策略,IPE需要有一种
+方式来防止从安全的策略更新回滚到不安全的策略。
+
+最初,IPE的策略可以包含一个policy_version字段,声明系统上所有可激活策略
+所需的最低版本号。这将在系统运行期间防止回滚。
+
+.. WARNING::
+
+ 然而,由于内核每次启动都是无状态的,因此该策略版本将在下次
+ 启动时被重置为0.0.0。系统构建者需要意识到这一点,并确保在启
+ 动后尽快部署新的安全策略,以确保攻击者部署不安全的策略的几
+ 率最小化。
+
+隐式操作:
+~~~~~~~~~
+
+隐式操作的问题只有在考虑系统中多个操作具有不同级别时才会显现出来。
+例如,考虑一个系统,该系统对可执行代码和系统中对其功能至关重要的
+特定数据提供强大的完整性保障。在这个系统中,可能存在三种类型的
+策略:
+
+ 1. 一种策略,在这种策略中,如果操作未能匹配到任何规则,则该操
+ 作将被拒绝。
+ 2. 一种策略,在这种策略中,如果操作未能匹配到任何规则,则该操
+ 作将被允许。
+ 3. 一种策略,在这种策略中,如果操作未能匹配到任何规则,则执行
+ 操作由策略作者指定。
+
+第一种类型的策略示例如下::
+
+ op=EXECUTE integrity_verified=YES action=ALLOW
+
+在示例系统中,这对于可执行文件来说效果很好,因为所有可执行文件
+都应该拥有完整性保障。但问题出现在第二个要求上,即关于特定数据
+文件的要求。这将导致如下策略(假设策略按行依次执行)::
+
+ op=EXECUTE integrity_verified=YES action=ALLOW
+
+ op=READ integrity_verified=NO label=critical_t action=DENY
+ op=READ action=ALLOW
+
+若阅读过文档,了解策略按顺序执行且默认动作是拒绝,那么这个策略的
+逻辑还算清晰;但最后一行规则实际上将读取操作的默认动作改成了允许。
+这种设计是必要的,因为在实际系统中,存在一些无需验证的读取操作(例
+如向日志文件追加内容时的读取操作)。
+
+第二种策略类型(未匹配任何规则时默认允许)在管控特定数据文件时逻辑
+更清晰,其策略可简化为::
+
+ op=READ integrity_verified=NO label=critical_t action=DENY
+
+但与第一种策略类似,这种默认允许的策略在管控执行操作时会存在缺陷,
+因此仍需显式覆盖默认动作::
+
+ op=EXECUTE integrity_verified=YES action=ALLOW
+ op=EXECUTE action=DENY
+
+ op=READ integrity_verified=NO label=critical_t action=DENY
+
+这就引出了第三种策略类型(自定义默认动作)。该类型无需让用户绞尽脑汁
+通过空规则覆盖默认动作,而是强制用户根据自身场景思考合适的默认动作是
+什么,并显式声明::
+
+ DEFAULT op=EXECUTE action=DENY
+ op=EXECUTE integrity_verified=YES action=ALLOW
+
+ DEFAULT op=READ action=ALLOW
+ op=READ integrity_verified=NO label=critical_t action=DENY
+
+策略调试:
+~~~~~~~~~
+
+在开发策略时,知道策略违反了哪一行有助于减少调试成本;可以
+将调查的范围缩小到导致该行为的确切行。有些完整性策略系统并
+不提供这一信息,而是提供评估过程中使用的信息。这随后需要将
+这些信息和策略进行关联,以分析哪里了问题。
+
+相反,IPE只会输出匹配到的规则。这将调查范围限制到确切到策略行
+(在特定规则的情况下)或部分(在DEFAULT规则的情况下)。当在
+评估策略时观察到策略失败时,这可以减少迭代和调查的时间。
+
+IPE的策略引擎还被设计成让人类容易理解如何调查策略失败。每一
+行都会按编写顺序进行评估,因此算法非常简单,便于人类重现步
+骤并找出可能导致失败的原因。而在调查其他的系统中,加载策略
+时会进行优化(例如对规则排序)。在这些系统中,调试需要多个
+步骤,而且没有先阅读代码的情况下,终端用户可能无法完全理解
+该算法的原理。
+
+简化策略:
+~~~~~~~~~
+
+最后,IPE的策略是为系统管理员设计的,而不是内核开发人员。
+IPE不涉及单独的LSM钩子(或系统调用),而是涵盖操作。这
+意味着,系统管理员不需要知道像 ``mmap`` 、 ``mprotect`` 、
+``execve`` 和 ``uselib`` 这些系统调用必须有规则进行保护,
+而只需要知道他们想要限制代码执行。这减少了由于缺乏对底层
+系统的了解而可能导致的绕过情况;而IPE的维护者作为内核开发
+人员,可以做出正确的选择,确定某些操作是否与这些操作匹配,
+以及在什么条件下匹配。
+
+实现说明
+--------
+
+匿名内存
+~~~~~~~~
+
+在IPE中,匿名内存的处理方式与其他任何类型的访问没有区别。当匿
+名内存使用 ``+X`` 映射时,它仍然会进入 ``file_mmp`` 或
+``file_mprotect`` 钩子,但此时会带有一个 ``NULL`` 文件对象
+这会像其他文件一样提交进行评估。然而,所有当前的信任属性都会
+评估为假,因为它们都是基于文件的,而此次操作并不与任何文件相关联。
+
+.. WARNING::
+
+ 这也适用于 ``kernel_load_data`` 钩子,当内核从一个没有文件
+ 支持的用户空间缓冲区加载数据时。在这种情况下,所有当前的信任
+ 属性也将评估为false。
+
+Securityfs接口
+~~~~~~~~~~~~~~
+
+每个策略的对应的securityfs树是有些独特的。例如,对于一个标准的
+securityfs策略树::
+
+ MyPolicy
+ |- active
+ |- delete
+ |- name
+ |- pkcs7
+ |- policy
+ |- update
+ |- version
+
+策略存储在MyPolicy对应节点的 ``->i_private`` 数据中。
+
+测试
+----
+
+IPE为策略解析器提供了KUnit测试。推荐kunitconfig::
+
+ CONFIG_KUNIT=y
+ CONFIG_SECURITY=y
+ CONFIG_SECURITYFS=y
+ CONFIG_PKCS7_MESSAGE_PARSER=y
+ CONFIG_SYSTEM_DATA_VERIFICATION=y
+ CONFIG_FS_VERITY=y
+ CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y
+ CONFIG_BLOCK=y
+ CONFIG_MD=y
+ CONFIG_BLK_DEV_DM=y
+ CONFIG_DM_VERITY=y
+ CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
+ CONFIG_NET=y
+ CONFIG_AUDIT=y
+ CONFIG_AUDITSYSCALL=y
+ CONFIG_BLK_DEV_INITRD=y
+
+ CONFIG_SECURITY_IPE=y
+ CONFIG_IPE_PROP_DM_VERITY=y
+ CONFIG_IPE_PROP_DM_VERITY_SIGNATURE=y
+ CONFIG_IPE_PROP_FS_VERITY=y
+ CONFIG_IPE_PROP_FS_VERITY_BUILTIN_SIG=y
+ CONFIG_SECURITY_IPE_KUNIT_TEST=y
+
+此外,IPE 具有一个基于 Python 的集成
+`测试套件 <https://github.com/microsoft/ipe/tree/test-suite>`_
+可以测试用户界面和强制执行功能。
diff --git a/Documentation/translations/zh_CN/security/lsm-development.rst b/Documentation/translations/zh_CN/security/lsm-development.rst
new file mode 100644
index 000000000000..7ed3719a9d07
--- /dev/null
+++ b/Documentation/translations/zh_CN/security/lsm-development.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/security/lsm-development.rst
+
+:翻译:
+ 赵硕 Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
+
+=================
+Linux安全模块开发
+=================
+
+基于https://lore.kernel.org/r/20071026073721.618b4778@laptopd505.fenrus.org,
+当一种新的LSM的意图(它试图防范什么,以及在哪些情况下人们会期望使用它)在
+``Documentation/admin-guide/LSM/`` 中适当记录下来后,就会被接受进入内核。
+这使得LSM的代码可以很轻松的与其目标进行对比,从而让最终用户和发行版可以更
+明智地决定那些LSM适合他们的需求。
+
+有关可用的 LSM 钩子接口的详细文档,请参阅 ``security/security.c`` 及相关结构。
diff --git a/Documentation/translations/zh_CN/security/secrets/coco.rst b/Documentation/translations/zh_CN/security/secrets/coco.rst
new file mode 100644
index 000000000000..a27bc1acdb7c
--- /dev/null
+++ b/Documentation/translations/zh_CN/security/secrets/coco.rst
@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../../disclaimer-zh_CN.rst
+
+:Original: Documentation/security/secrets/coco.rst
+
+:翻译:
+
+ 赵硕 Shuo Zhao <zhaoshuo@cqsoftware.com.cn>
+
+============
+机密计算密钥
+============
+
+本文档介绍了在EFI驱动程序和efi_secret内核模块中,机密计算密钥从固件
+到操作系统的注入处理流程。
+
+简介
+====
+
+机密计算硬件(如AMD SEV,Secure Encrypted Virtualization)允许虚拟机
+所有者将密钥注入虚拟机(VM)内存,且主机/虚拟机监控程序无法读取这些密
+钥。在SEV中,密钥注入需在虚拟机启动流程的早期阶段(客户机开始运行前)
+执行。
+
+efi_secret内核模块允许用户空间应用程序通过securityfs(安全文件系统)访
+问这些密钥。
+
+密钥数据流
+==========
+
+客户机固件可能会为密钥注入预留一块指定的内存区域,并将该区域的位置(基准
+客户机物理地址GPA和长度)在EFI配置表中,通过 ``LINUX_EFI_COCO_SECRET_AREA_GUID``
+条目(对应的GUID值为 ``adf956ad-e98c-484c-ae11-b51c7d336447`` )的形式发布。
+固件应将此内存区域标记为 ``EFI_RESERVED_TYPE`` ,因此内核不应将其用于自身用途。
+
+虚拟机启动过程中,虚拟机管理器可向该区域注入密钥。在AMD SEV和SEV-ES中,此
+操作通过 ``KVM_SEV_LAUNCH_SECRET`` 命令执行(参见 [sev_CN]_ )。注入的“客户机
+所有者密钥数据”应采用带GUID的密钥值表结构,其二进制格式在 ``drivers/virt/
+coco/efi_secret/efi_secret.c`` 文件的EFI密钥区域结构部分中有详细描述。
+
+内核启动时,内核的EFI驱动程序将保存密钥区域位置(来自EFI配置表)到 ``efi.coco_secret``
+字段。随后,它会检查密钥区域是否已填充:映射该区域并检查其内容是否以
+``EFI_SECRET_TABLE_HEADER_GUID`` (对应的GUID为 ``1e74f542-71dd-4d66-963e-ef4287ff173b`` )
+开头。如果密钥区域已填充,EFI驱动程序将自动加载efi_secret内核模块,并通过securityfs将密钥
+暴露给用户空间应用程序。efi_secret文件系统接口的详细信息请参考 [secrets-coco-abi_CN]_ 。
+
+
+应用使用示例
+============
+
+假设客户机需要对加密文件进行计算处理。客户机所有者通过密钥注入机制提供解密密钥
+(即密钥)。客户机应用程序从efi_secret文件系统读取该密钥,然后将文件解密到内存中,
+接着对内容进行需要的计算。
+
+在此示例中,主机无法从磁盘镜像中读取文件,因为文件是加密的;主机无法读取解密密钥,
+因为它是通过密钥注入机制(即安全通道)传递的;主机也无法读取内存中的解密内容,因为
+这是一个机密型(内存加密)客户机。
+
+以下是一个简单的示例,展示了在客户机中使用efi_secret模块的过程,在启动时注入了
+一个包含4个密钥的EFI密钥区域::
+
+ # ls -la /sys/kernel/security/secrets/coco
+ total 0
+ drwxr-xr-x 2 root root 0 Jun 28 11:54 .
+ drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
+ -r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
+ -r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
+ -r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
+ -r--r----- 1 root root 0 Jun 28 11:54 e6f5a162-d67f-4750-a67c-5d065f2a9910
+
+ # hd /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
+ 00000000 74 68 65 73 65 2d 61 72 65 2d 74 68 65 2d 6b 61 |these-are-the-ka|
+ 00000010 74 61 2d 73 65 63 72 65 74 73 00 01 02 03 04 05 |ta-secrets......|
+ 00000020 06 07 |..|
+ 00000022
+
+ # rm /sys/kernel/security/secrets/coco/e6f5a162-d67f-4750-a67c-5d065f2a9910
+
+ # ls -la /sys/kernel/security/secrets/coco
+ total 0
+ drwxr-xr-x 2 root root 0 Jun 28 11:55 .
+ drwxr-xr-x 3 root root 0 Jun 28 11:54 ..
+ -r--r----- 1 root root 0 Jun 28 11:54 736870e5-84f0-4973-92ec-06879ce3da0b
+ -r--r----- 1 root root 0 Jun 28 11:54 83c83f7f-1356-4975-8b7e-d3a0b54312c6
+ -r--r----- 1 root root 0 Jun 28 11:54 9553f55d-3da2-43ee-ab5d-ff17f78864d2
+
+
+参考文献
+========
+
+请参见 [sev-api-spec_CN]_ 以获取有关SEV ``LAUNCH_SECRET`` 操作的更多信息。
+
+.. [sev_CN] Documentation/virt/kvm/x86/amd-memory-encryption.rst
+.. [secrets-coco-abi_CN] Documentation/ABI/testing/securityfs-secrets-coco
+.. [sev-api-spec_CN] https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf
+
diff --git a/Documentation/translations/zh_CN/security/secrets/index.rst b/Documentation/translations/zh_CN/security/secrets/index.rst
index 5ea78713f10e..38464dcb2c3c 100644
--- a/Documentation/translations/zh_CN/security/secrets/index.rst
+++ b/Documentation/translations/zh_CN/security/secrets/index.rst
@@ -5,13 +5,10 @@
:翻译:
-=====================
+========
密钥文档
-=====================
+========
.. toctree::
-
-TODOLIST:
-
-* coco
+ coco
diff --git a/Documentation/translations/zh_CN/subsystem-apis.rst b/Documentation/translations/zh_CN/subsystem-apis.rst
index 8b646c1010be..830217140fb6 100644
--- a/Documentation/translations/zh_CN/subsystem-apis.rst
+++ b/Documentation/translations/zh_CN/subsystem-apis.rst
@@ -71,12 +71,11 @@ TODOList:
:maxdepth: 1
filesystems/index
+ scsi/index
TODOList:
-* block/index
* cdrom/index
-* scsi/index
* target/index
**Fixme**: 这里还需要更多的分类组织工作。
diff --git a/Documentation/translations/zh_TW/admin-guide/README.rst b/Documentation/translations/zh_TW/admin-guide/README.rst
index 0b038074d9d1..c8b7ccfaa656 100644
--- a/Documentation/translations/zh_TW/admin-guide/README.rst
+++ b/Documentation/translations/zh_TW/admin-guide/README.rst
@@ -291,5 +291,5 @@ Documentation/translations/zh_CN/admin-guide/bug-hunting.rst 。
更多用GDB調試內核的信息,請參閱:
Documentation/translations/zh_CN/dev-tools/gdb-kernel-debugging.rst
-和 Documentation/dev-tools/kgdb.rst 。
+和 Documentation/process/debugging/kgdb.rst 。
diff --git a/Documentation/translations/zh_TW/dev-tools/gdb-kernel-debugging.rst b/Documentation/translations/zh_TW/dev-tools/gdb-kernel-debugging.rst
index b595af59ba78..4fd1757c3036 100644
--- a/Documentation/translations/zh_TW/dev-tools/gdb-kernel-debugging.rst
+++ b/Documentation/translations/zh_TW/dev-tools/gdb-kernel-debugging.rst
@@ -2,7 +2,7 @@
.. include:: ../disclaimer-zh_TW.rst
-:Original: Documentation/dev-tools/gdb-kernel-debugging.rst
+:Original: Documentation/process/debugging/gdb-kernel-debugging.rst
:Translator: 高超 gao chao <gaochao49@huawei.com>
通過gdb調試內核和模塊
diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst
index a4435ae4628d..e5ebc617754a 100644
--- a/Documentation/userspace-api/netlink/intro-specs.rst
+++ b/Documentation/userspace-api/netlink/intro-specs.rst
@@ -13,10 +13,10 @@ Simple CLI
Kernel comes with a simple CLI tool which should be useful when
developing Netlink related code. The tool is implemented in Python
and can use a YAML specification to issue Netlink requests
-to the kernel. Only Generic Netlink is supported.
+to the kernel.
The tool is located at ``tools/net/ynl/pyynl/cli.py``. It accepts
-a handul of arguments, the most important ones are:
+a handful of arguments, the most important ones are:
- ``--spec`` - point to the spec file
- ``--do $name`` / ``--dump $name`` - issue request ``$name``
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index 5e8ed9eef9aa..ca89151fc0a8 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -26,7 +26,8 @@ PR_GET_SPECULATION_CTRL
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bits 0-3 with
-the following meaning:
+the following meaning (with the caveat that PR_SPEC_L1D_FLUSH has less obvious
+semantics, see documentation for that specific control below):
==== ====================== ==================================================
Bit Define Description
@@ -110,6 +111,9 @@ Speculation misfeature controls
- PR_SPEC_L1D_FLUSH: Flush L1D Cache on context switch out of the task
(works only when tasks run on non SMT cores)
+For this control, PR_SPEC_ENABLE means that the **mitigation** is enabled (L1D
+is flushed), PR_SPEC_DISABLE means it is disabled.
+
Invocations:
* prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, 0, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_ENABLE, 0, 0);
diff --git a/Documentation/w1/w1-netlink.rst b/Documentation/w1/w1-netlink.rst
index be4f7b82dcb4..ff281713e626 100644
--- a/Documentation/w1/w1-netlink.rst
+++ b/Documentation/w1/w1-netlink.rst
@@ -196,7 +196,7 @@ Additional documentation, source code examples
==============================================
1. Documentation/driver-api/connector.rst
-2. http://www.ioremap.net/archive/w1
+2. https://github.com/bioothod/w1
This archive includes userspace application w1d.c which uses
read/write/search commands for all master/slave devices found on the bus.
diff --git a/Documentation/wmi/driver-development-guide.rst b/Documentation/wmi/driver-development-guide.rst
index 99ef21fc1c1e..5680303ae314 100644
--- a/Documentation/wmi/driver-development-guide.rst
+++ b/Documentation/wmi/driver-development-guide.rst
@@ -54,6 +54,7 @@ to matching WMI devices using a struct wmi_device_id table:
::
static const struct wmi_device_id foo_id_table[] = {
+ /* Only use uppercase letters! */
{ "936DA01F-9ABD-4D9D-80C7-02AF85C822A8", NULL },
{ }
};