| Age | Commit message (Collapse) | Author |
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- Support for userspace handling of synchronous external aborts
(SEAs), allowing the VMM to potentially handle the abort in a
non-fatal manner
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers
in hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU
- Allow page table destruction to reschedule, fixing long
need_resched latencies observed when destroying a large VM
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register
- Add AVEC basic support
- Use 64-bit register definition for EIOINTC
- Add KVM timer test cases for tools/selftests
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of exits
(and worse performance) on z10 and earlier processor; ESCA was
introduced by z114/z196 in 2010
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
SPTE caching is disabled, as there can't be any relevant SPTEs to
zap
- Relocate a misplaced export
- Fix an async #PF bug where KVM would clear the completion queue
when the guest transitioned in and out of paging mode, e.g. when
handling an SMI and then returning to paged mode via RSM
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
keeping the notifier registered is ok; the kernel does not use the
MSRs and the callback will run cleanly and restore host MSRs if the
CPU manages to return to userspace before the system goes down
- Use the checked version of {get,put}_user()
- Fix a long-lurking bug where KVM's lack of catch-up logic for
periodic APIC timers can result in a hard lockup in the host
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
latter behind CONFIG_CPU_MITIGATIONS
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
path; the only reason they were handled in the fast path was to
paper of a bug in the core #MC code, and that has long since been
fixed
- Add emulator support for AVX MOV instructions, to play nice with
emulated devices whose guest drivers like to access PCI BARs with
large multi-byte instructions
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs
- Fix the worst of KVM's lack of EFER.LMSLE emulation
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode
- Fix incorrect handling of selective CR0 writes when checking
intercepts during emulation of L2 instructions
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
on VMRUN and #VMEXIT
- Fix a bug where KVM corrupt the guest code stream when re-injecting
a soft interrupt if the guest patched the underlying code after the
VM-Exit, e.g. when Linux patches code with a temporary INT3
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
to userspace, and extend KVM "support" to all policy bits that
don't require any actual support from KVM
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of
the current vCPU state, partly as worthwhile cleanup, but mostly to
pave the way for tracking per-root TLB flushes, and elide EPT
flushes on pCPU migration if the root is clean from a previous
flush
- Add a few missing nested consistency checks
- Rip out support for doing "early" consistency checks via hardware
as the functionality hasn't been used in years and is no longer
useful in general; replace it with an off-by-default module param
to WARN if hardware fails a check that KVM does not perform
- Fix a currently-benign bug where KVM would drop the guest's
SPEC_CTRL[63:32] on VM-Enter
- Misc cleanups
- Overhaul the TDX code to address systemic races where KVM (acting
on behalf of userspace) could inadvertantly trigger lock contention
in the TDX-Module; KVM was either working around these in weird,
ugly ways, or was simply oblivious to them (though even Yan's
devilish selftests could only break individual VMs, not the host
kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
TDX vCPU, if creating said vCPU failed partway through
- Fix a few sparse warnings (bad annotation, 0 != NULL)
- Use struct_size() to simplify copying TDX capabilities to userspace
- Fix a bug where TDX would effectively corrupt user-return MSR
values if the TDX Module rejects VP.ENTER and thus doesn't clobber
host MSRs as expected
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU
system/VM
- Forcefully override ARCH from x86_64 to x86 to play nice with
specifying ARCH=x86_64 on the command line
- Extend a bunch of nested VMX to validate nested SVM as well
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test
to verify KVM can save/restore nested VMX state when L1 is using
5-level paging, but L2 is not
- Clean up the guest paging code in anticipation of sharing the core
logic for nested EPT and nested NPT
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety
of rough edges in guest_memfd along the way
- Define a CLASS to automatically handle get+put when grabbing a
guest_memfd from a memslot to make it harder to leak references
- Enhance KVM selftests to make it easer to develop and debug
selftests like those added for guest_memfd NUMA support, e.g. where
test and/or KVM bugs often result in hard-to-debug SIGBUS errors
- Misc cleanups
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU
workqueue for irqfd cleanup
- Fix a goof in the dirty ring documentation
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration
of hotplug memory, allowing with and without memmap_on_memory
support. This makes the way memory hotplug is handled on s390 much
more similar to other architectures
- Remove compat support. There shouldn't be any compat user space
around anymore, therefore get rid of a lot of code which also doesn't
need to be tested anymore
- Add stackprotector support. GCC 16 will get new compiler options,
which allow to generate code required for kernel stackprotector
support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This
removes a lot of duplicated code. The new driver is also extendable
and allows to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of
crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area
and the kernel image are within the same 4GB area. This eliminates
the need of weak per cpu variables. Get rid of
ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
s390/entry: Use lay instead of aghik
s390/vdso: Get rid of -m64 flag handling
s390/vdso: Rename vdso64 to vdso
s390: Rename head64.S to head.S
s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
s390: Add stackprotector support
s390/modules: Simplify module_finalize() slightly
s390: Remove KMSG_COMPONENT macro
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
s390/ap: Restrict driver_override versus apmask and aqmask use
s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
s390/ap: Support driver_override for AP queue devices
s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
s390/debug: Update description of resize operation
s390/syscalls: Switch to generic system call table generation
s390/syscalls: Remove system call table pointer from thread_struct
s390/uapi: Remove 31 bit support from uapi header files
s390: Remove compat support
tools: Remove s390 compat support
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- SCA rework
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scoped user access updates from Thomas Gleixner:
"Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
CONFIG_CPU_SPECTRE=n.
This makes it possible to access a 64bit value in generic code with
[unsafe_]get_user(). All other architectures and ARM variants
provide the relevant accessors already.
- Ensure that ASM GOTO jump label usage in the user mode access
helpers always goes through a local C scope label indirection
inside the helpers.
This is required because compilers are not supporting that a ASM
GOTO target leaves a auto cleanup scope. GCC silently fails to emit
the cleanup invocation and CLANG fails the build.
[ Editor's note: gcc-16 will have fixed the code generation issue
in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
goto jumps [PR122835]"). But we obviously have to deal with clang
and older versions of gcc, so.. - Linus ]
This provides generic wrapper macros and the conversion of affected
architecture code to use them.
- Scoped user mode access with auto cleanup
Access to user mode memory can be required in hot code paths, but
if it has to be done with user controlled pointers, the access is
shielded with a speculation barrier, so that the CPU cannot
speculate around the address range check. Those speculation
barriers impact performance quite significantly.
This cost can be avoided by "masking" the provided pointer so it is
guaranteed to be in the valid user memory access range and
otherwise to point to a guaranteed unpopulated address space. This
has to be done without branches so it creates an address dependency
for the access, which the CPU cannot speculate ahead.
This results in repeating and error prone programming patterns:
if (can_do_masked_user_access())
from = masked_user_read_access_begin((from));
else if (!user_read_access_begin(from, sizeof(*from)))
return -EFAULT;
unsafe_get_user(val, from, Efault);
user_read_access_end();
return 0;
Efault:
user_read_access_end();
return -EFAULT;
which can be replaced with scopes and automatic cleanup:
scoped_user_read_access(from, Efault)
unsafe_get_user(val, from, Efault);
return 0;
Efault:
return -EFAULT;
- Convert code which implements the above pattern over to
scope_user.*.access(). This also corrects a couple of imbalanced
masked_*_begin() instances which are harmless on most
architectures, but prevent PowerPC from implementing the masking
optimization.
- Add a missing speculation barrier in copy_from_user_iter()"
* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
scm: Convert put_cmsg() to scoped user access
iov_iter: Add missing speculation barrier to copy_from_user_iter()
iov_iter: Convert copy_from_user_iter() to masked user access
select: Convert to scoped user access
x86/futex: Convert to scoped user access
futex: Convert to get/put_user_inline()
uaccess: Provide put/get_user_inline()
uaccess: Provide scoped user access regions
arm64: uaccess: Use unsafe wrappers for ASM GOTO
s390/uaccess: Use unsafe wrappers for ASM GOTO
riscv/uaccess: Use unsafe wrappers for ASM GOTO
powerpc/uaccess: Use unsafe wrappers for ASM GOTO
x86/uaccess: Use unsafe wrappers for ASM GOTO
uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
ARM: uaccess: Implement missing __get_user_asm_dword()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull bug handling infrastructure updates from Ingo Molnar:
"Core updates:
- Improve WARN(), which has vararg printf like arguments, to work
with the x86 #UD based WARN-optimizing infrastructure by hiding the
format in the bug_table and replacing this first argument with the
address of the bug-table entry, while making the actual function
that's called a UD1 instruction (Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
Molnar, s390 support by Heiko Carstens)
Fixes and cleanups:
- bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
Zijlstra)"
* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
x86/bug: Fix BUG_FORMAT vs KASLR
x86_64/bug: Inline the UD1
x86/bug: Implement WARN_ONCE()
x86_64/bug: Implement __WARN_printf()
x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
x86/bug: Add BUG_FORMAT basics
bug: Allow architectures to provide __WARN_printf()
bug: Implement WARN_ON() using __WARN_FLAGS()
bug: Add report_bug_entry()
bug: Add BUG_FORMAT_ARGS infrastructure
bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
bug: Add BUG_FORMAT infrastructure
x86: Rework __bug_table helpers
bugs/s390: Remove private WARN_ON() implementation
bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- klp-build livepatch module generation (Josh Poimboeuf)
Introduce new objtool features and a klp-build script to generate
livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch
project which began in 2012 and has been used for many years to
generate livepatch modules for production kernels. However, this is a
complete rewrite which incorporates hard-earned lessons from 12+
years of maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for
symbol/section/reloc inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines
script which injects #line directives into the source .patch to
preserve the original line numbers at compile time.
- Disassemble code with libopcodes instead of running objdump
(Alexandre Chartre)
- Disassemble support (-d option to objtool) by Alexandre Chartre,
which supports the decoding of various Linux kernel code generation
specials such as alternatives:
17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx
17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT
17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax
17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx
... jump table alternatives:
1895: sched_use_asym_prio+0x5 test $0x8,%ch
1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19>
189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP
189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2
189c: sched_use_asym_prio+0xc mov $0x1,%eax
18a1: sched_use_asym_prio+0x11 and $0x80,%ecx
... exception table alternatives:
native_read_msr:
5b80: native_read_msr+0x0 mov %edi,%ecx
5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION
5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4>
5b84: native_read_msr+0x4 shl $0x20,%rdx
.... x86 feature flag decoding (also see the X86_FEATURE_POPCNT
example in sched_balance_find_dst_group() above):
2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114>
2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG
2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5
2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax
... NOP sequence shortening:
1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7>
1048e4: snapshot_write_finalize+0xc4 nop6
1048ea: snapshot_write_finalize+0xca nop11
1048f5: snapshot_write_finalize+0xd5 nop11
104900: snapshot_write_finalize+0xe0 mov %rax,%rcx
104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax
... and much more.
- Function validation tracing support (Alexandre Chartre)
- Various -ffunction-sections fixes (Josh Poimboeuf)
- Clang AutoFDO (Automated Feedback-Directed Optimizations) support
(Josh Poimboeuf)
- Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
Thorsten Blum)
* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
objtool: Fix segfault on unknown alternatives
objtool: Build with disassembly can fail when including bdf.h
objtool: Trim trailing NOPs in alternative
objtool: Add wide output for disassembly
objtool: Compact output for alternatives with one instruction
objtool: Improve naming of group alternatives
objtool: Add Function to get the name of a CPU feature
objtool: Provide access to feature and flags of group alternatives
objtool: Fix address references in alternatives
objtool: Disassemble jump table alternatives
objtool: Disassemble exception table alternatives
objtool: Print addresses with alternative instructions
objtool: Disassemble group alternatives
objtool: Print headers for alternatives
objtool: Preserve alternatives order
objtool: Add the --disas=<function-pattern> action
objtool: Do not validate IBT for .return_sites and .call_sites
objtool: Improve tracing of alternative instructions
objtool: Add functions to better name alternatives
objtool: Identify the different types of alternatives
...
|
|
Move enabling and disabling of interrupts around the SIE instruction to
entry code. Enabling interrupts only after the __TI_sie flag has been set
guarantees that the SIE instruction is not executed if an interrupt happens
between enabling interrupts and the execution of the SIE instruction.
Interrupt handlers and machine check handler forward the PSW to the
sie_exit label in such cases.
This is a prerequisite for VIRT_XFER_TO_GUEST_WORK to prevent that guest
context is entered when e.g. a scheduler IPI, indicating that a reschedule
is required, happens right before the SIE instruction, which could lead to
long delays.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
Add a signal_exits counter for s390, as exists on arm64, loongarch, mips,
powerpc, riscv and x86.
This is used by kvm_handle_signal_exit(), which we will use when we
later enable CONFIG_VIRT_XFER_TO_GUEST_WORK.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
Since compat is gone there is only a 64 bit vdso left.
Remove the superfluous "64" suffix everywhere.
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Stackprotector support was previously unavailable on s390 because by
default compilers generate code which is not suitable for the kernel:
the canary value is accessed via thread local storage, where the address
of thread local storage is within access registers 0 and 1.
Using those registers also for the kernel would come with a significant
performance impact and more complicated kernel entry/exit code, since
access registers contents would have to be exchanged on every kernel entry
and exit.
With the upcoming gcc 16 release new compiler options will become available
which allow to generate code suitable for the kernel. [1]
Compiler option -mstack-protector-guard=global instructs gcc to generate
stackprotector code that refers to a global stackprotector canary value via
symbol __stack_chk_guard. Access to this value is guaranteed to occur via
larl and lgrl instructions.
Furthermore, compiler option -mstack-protector-guard-record generates a
section containing all code addresses that reference the canary value.
To allow for per task canary values the instructions which load the address
of __stack_chk_guard are patched so they access a lowcore field instead: a
per task canary value is available within the task_struct of each task, and
is written to the per-cpu lowcore location on each context switch.
Also add sanity checks and debugging option to be consistent with other
kernel code patching mechanisms.
Full debugging output can be enabled with the following kernel command line
options:
debug_stackprotector
bootdebug
ignore_loglevel
earlyprintk
dyndbg="file stackprotector.c +p"
Example debug output:
stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240
where "<insn address>: <old insn> -> <new insn>".
[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since the rework of the kernel virtual address space [1] the module area
and the kernel image are within the same 4GB area. Therefore there is no
need for the weak per cpu workaround for modules anymore. Remove it.
[1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation
exceptions to user space. This also includes the 0x0000 instructions
managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants
to emulate instructions which do not (yet) have an opcode.
While we're at it refine the documentation for
KVM_CAP_S390_USER_INSTR0.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
The s390 syscall.tbl format differs slightly from most others, and
therefore requires an s390 specific system call table generation
script.
With compat support gone use the opportunity to switch to generic
system call table generation. The abi for all 64 bit system calls is
now common, since there is no need to specify if system call entry
points are only for 64 bit anymore.
Furthermore create the system call table in C instead of assembler
code in order to get type checking for all system call functions
contained within the table.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
With compat support gone there is only one system call table
left. Therefore remove the sys_call_table pointer from
thread_struct and use the sys_call_table directly.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
There shouldn't be any 31 bit code around anymore that matters.
Remove the compat layer support required to run 31 bit code.
Reason for removal is code simplification and reduced test effort.
Note that this comes without any deprecation warnings added to config
options, or kernel messages, since most likely those would be ignored
anyway.
If it turns out there is still a reason to keep the compat layer this
can be reverted at any time in the future.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
All system call wrappers should match the sys_call_ptr_t type. This is not
the case for system calls without parameters. Add the missing pt_regs
parameter there too.
Note: this is currently not a problem, since the parameter is unused.
However it prevents to create a correctly typed system call table in
C. With the current assembler implementation this works because of
missing type checking.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use a standard "_t" suffix for psw_t32 and rename it to psw32_t.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When a zero ASCE is passed to the __ptep_rdp() inline assembly, the
generated instruction should have the R3 field of the instruction set to
zero. However the inline assembly is written incorrectly: for such cases a
zero is loaded into a register allocated by the compiler and this register
is then used by the instruction.
This means that selected TLB entries may not be flushed since the specified
ASCE does not match the one which was used when the selected TLB entries
were created.
Fix this by removing the asce and opt parameters of __ptep_rdp(), since
all callers always pass zero, and use a hard-coded register zero for
the R3 field.
Fixes: 0807b856521f ("s390/mm: add support for RDP (Reset DAT-Protection)")
Cc: stable@vger.kernel.org
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In case of a kernel crash caused by a protection exception, print the
unmodified PSW address as reported by the CPU. The protection exception
handler modifies the PSW address in order to keep fault handling easy,
however that leads to misleading call traces.
Therefore restore the original PSW address before printing it.
Before this change the output in case of a protection exception looks like
this:
Oops: 0004 ilc:2 [#1]SMP
Krnl PSW : 0704c00180000000 000003ffe0b40d78 (sysrq_handle_crash+0x28/0x40)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
...
Krnl Code: 000003ffe0b40d66: e3e0f0980024 stg %r14,152(%r15)
000003ffe0b40d6c: c010fffffff2 larl %r1,000003ffe0b40d50
#000003ffe0b40d72: c0200046b6bc larl %r2,000003ffe1417aea
>000003ffe0b40d78: 92021000 mvi 0(%r1),2
000003ffe0b40d7c: c0e5ffae03d6 brasl %r14,000003ffe0101528
With this change it looks like this:
Oops: 0004 ilc:2 [#1]SMP
Krnl PSW : 0704c00180000000 000003ffe0b40dfc (sysrq_handle_crash+0x2c/0x40)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
...
Krnl Code: 000003ffe0b40dec: c010fffffff2 larl %r1,000003ffe0b40dd0
000003ffe0b40df2: c0200046b67c larl %r2,000003ffe1417aea
*000003ffe0b40df8: 92021000 mvi 0(%r1),2
>000003ffe0b40dfc: c0e5ffae03b6 brasl %r14,000003ffe0101568
000003ffe0b40e02: 0707 bcr 0,%r7
Note that with this change the PSW address points to the instruction behind
the instruction which caused the exception like it is expected for
protection exceptions.
This also replaces the '#' marker in the disassembly with '*', which allows
to distinguish between new and old behavior.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Similar to __rewind_psw() add the counter part __forward_psw(). This
helps to make code more readable if a PSW address has to be forwarded,
since it is more natural to write
addr = __forward_psw(psw, ilen);
instead of
addr = __rewind_psw(psw, -ilen);
This renames also the ilc parameter of __rewind_psw() to ilen, since
the parameter reflects an instruction length, and not an instruction
length code. Also change the type of ilen from unsigned long to long
so it reflects that lengths can be negative or positive.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
A false-positive kmsan report is detected when running ping command.
An inline assembly instruction 'vstl' can write varied amount of bytes
depending on value of 'index' argument. If 'index' > 0, 'vstl' writes
at least 2 bytes.
clang generates kmsan write helper call depending on inline assembly
constraints. Constraints are evaluated compile-time, but value of
'index' argument is known only at runtime.
clang currently generates call to __msan_instrument_asm_store with 1 byte
as size. Manually call kmsan function to indicate correct amount of bytes
written and fix false-positive report.
This change fixes following kmsan reports:
[ 36.563119] =====================================================
[ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70
[ 36.563852] virtqueue_add+0x35c6/0x7c70
[ 36.564016] virtqueue_add_outbuf+0xa0/0xb0
[ 36.564266] start_xmit+0x288c/0x4a20
[ 36.564460] dev_hard_start_xmit+0x302/0x900
[ 36.564649] sch_direct_xmit+0x340/0xea0
[ 36.564894] __dev_queue_xmit+0x2e94/0x59b0
[ 36.565058] neigh_resolve_output+0x936/0xb40
[ 36.565278] __neigh_update+0x2f66/0x3a60
[ 36.565499] neigh_update+0x52/0x60
[ 36.565683] arp_process+0x1588/0x2de0
[ 36.565916] NF_HOOK+0x1da/0x240
[ 36.566087] arp_rcv+0x3e4/0x6e0
[ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0
[ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0
[ 36.566710] napi_complete_done+0x376/0x740
[ 36.566918] virtnet_poll+0x1bae/0x2910
[ 36.567130] __napi_poll+0xf4/0x830
[ 36.567294] net_rx_action+0x97c/0x1ed0
[ 36.567556] handle_softirqs+0x306/0xe10
[ 36.567731] irq_exit_rcu+0x14c/0x2e0
[ 36.567910] do_io_irq+0xd4/0x120
[ 36.568139] io_int_handler+0xc2/0xe8
[ 36.568299] arch_cpu_idle+0xb0/0xc0
[ 36.568540] arch_cpu_idle+0x76/0xc0
[ 36.568726] default_idle_call+0x40/0x70
[ 36.568953] do_idle+0x1d6/0x390
[ 36.569486] cpu_startup_entry+0x9a/0xb0
[ 36.569745] rest_init+0x1ea/0x290
[ 36.570029] start_kernel+0x95e/0xb90
[ 36.570348] startup_continue+0x2e/0x40
[ 36.570703]
[ 36.570798] Uninit was created at:
[ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0
[ 36.571261] kmalloc_reserve+0x12a/0x470
[ 36.571553] __alloc_skb+0x310/0x860
[ 36.571844] __ip_append_data+0x483e/0x6a30
[ 36.572170] ip_append_data+0x11c/0x1e0
[ 36.572477] raw_sendmsg+0x1c8c/0x2180
[ 36.572818] inet_sendmsg+0xe6/0x190
[ 36.573142] __sys_sendto+0x55e/0x8e0
[ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0
[ 36.573571] __do_syscall+0x12e/0x240
[ 36.573823] system_call+0x6e/0x90
[ 36.573976]
[ 36.574017] Byte 35 of 98 is uninitialized
[ 36.574082] Memory access of size 98 starts at 0000000007aa0012
[ 36.574218]
[ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty #16 NONE
[ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST
[ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux)
[ 36.574755] =====================================================
[ 63.532541] =====================================================
[ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70
[ 63.533989] virtqueue_add+0x35c6/0x7c70
[ 63.534940] virtqueue_add_outbuf+0xa0/0xb0
[ 63.535861] start_xmit+0x288c/0x4a20
[ 63.536708] dev_hard_start_xmit+0x302/0x900
[ 63.537020] sch_direct_xmit+0x340/0xea0
[ 63.537997] __dev_queue_xmit+0x2e94/0x59b0
[ 63.538819] neigh_resolve_output+0x936/0xb40
[ 63.539793] ip_finish_output2+0x1ee2/0x2200
[ 63.540784] __ip_finish_output+0x272/0x7a0
[ 63.541765] ip_finish_output+0x4e/0x5e0
[ 63.542791] ip_output+0x166/0x410
[ 63.543771] ip_push_pending_frames+0x1a2/0x470
[ 63.544753] raw_sendmsg+0x1f06/0x2180
[ 63.545033] inet_sendmsg+0xe6/0x190
[ 63.546006] __sys_sendto+0x55e/0x8e0
[ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0
[ 63.547730] __do_syscall+0x12e/0x240
[ 63.548019] system_call+0x6e/0x90
[ 63.548989]
[ 63.549779] Uninit was created at:
[ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0
[ 63.550975] kmalloc_reserve+0x12a/0x470
[ 63.551969] __alloc_skb+0x310/0x860
[ 63.552949] __ip_append_data+0x483e/0x6a30
[ 63.553902] ip_append_data+0x11c/0x1e0
[ 63.554912] raw_sendmsg+0x1c8c/0x2180
[ 63.556719] inet_sendmsg+0xe6/0x190
[ 63.557534] __sys_sendto+0x55e/0x8e0
[ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0
[ 63.558869] __do_syscall+0x12e/0x240
[ 63.559832] system_call+0x6e/0x90
[ 63.560780]
[ 63.560972] Byte 35 of 98 is uninitialized
[ 63.561741] Memory access of size 98 starts at 0000000005704312
[ 63.561950]
[ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty #16 NONE
[ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST
[ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux)
[ 63.564986] =====================================================
Fixes: dcd3e1de9d17 ("s390/checksum: provide csum_partial_copy_nocheck()")
Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
flush_tlb() exists for historic reasons and was never used. Remove it.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rework PAI crypto event initialization. Add a common
function for event initialization. It uses the PAI characteristics
stored in the pai_pmu table instead of hardcoded values.
Enlarge pai_event_valid() to check all event validation aspects.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
pcpu_delegate() never returns to its caller. If the target CPU is the
current CPU, it calls __pcpu_delegate(), whose delegate function is not
supposed to return. In any case, even if __pcpu_delegate() unexpectedly
returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits
in an infinite loop. Annotate pcpu_delegate() with the __noreturn
attribute to improve compiler optimizations.
Also annotate smp_call_ipl_cpu() accordingly since it always calls
pcpu_delegate().
[hca: Merge two patches from Thorsten Blum]
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Heiko Carstens says:
====================
Add the Dat-Enhancement facility 1 to the list of facilities which are
required to start the kernel. The facility provides the CSPG and IDTE
instructions. In particular the CSPG instruction can be used to replace a
valid page table entry with a different page table entry, which also
differs in the page frame real address.
Without the CSPG instruction it is possible to use the CSP instruction to
change valid page table entries, however it only allows to change the lower
or higher 32 bits of such entries, which means it cannot be used to change
the page frame real address of valid page table entries.
Given that there is code around (e.g. HugeTLB vmemmap optimization) which
requires to change valid page table entries of the kernel mapping, without
the detour over an invalid page table entry, make the CSPG instruction
unconditionally available.
The Dat-Enhancement facility 1 is available since z990, which is older than
the currently supported minimum architecture (z10). Therefore adding this
the architecture level set shouldn't cause any problems.
====================
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The CSPG instruction is part of the Dat-Enhancement facility 1, which
is always available. Given that it can be used everywhere where also
the CSP instruction can be used, replace CSP with CSPG everywhere.
This allows to remove the csp() inline assembly. Also remove the
unused gmap_pmdp_csp() function.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Remove cpu_has_idte(). The IDTE instruction is part of the
Dat-Enhancement facility 1, which is always available.
Therefore remove the helper and now superfluous code.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:
bool foo(u32 __user *p, u32 val)
{
scoped_guard(pagefault)
unsafe_put_user(val, p, efault);
return true;
efault:
return false;
}
It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.
S390 is not affected for unsafe_*_user() as it uses its own local label
already, but __get/put_kernel_nofault() lack that.
Rename them to arch_*_kernel_nofault() which makes the generic uaccess
header wrap it with a local label that makes both compilers emit correct
code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://patch.msgid.link/20251027083745.483079889@linutronix.de
|
|
The psw_bits() macro makes use of typecheck() without that typecheck.h
is included. Add the missing include to avoid potential future compile
problems.
[hca@linux.ibm.com: change commit message]
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Commit c1e18c17bda6 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
resetting a zPCI device.
Commit da995d538d3a ("s390/pci: implement reset_slot for hotplug
slot"), mentions zpci_clear_irq() being called in the path for
zpci_hot_reset_device(). But that is not the case anymore and these
functions are not called outside of this file. Instead
zpci_hot_reset_device() relies on zpci_disable_device() also clearing
the IRQs, but misses to reset the zdev->irqs_registered flag.
However after a CLP disable/enable reset, the device's IRQ are
unregistered, but the flag zdev->irq_registered does not get cleared. It
creates an inconsistent state and so arch_restore_msi_irqs() doesn't
correctly restore the device's IRQ. This becomes a problem when a PCI
driver tries to restore the state of the device through
pci_restore_state(). Restore IRQ unconditionally for the device and remove
the irq_registered flag as its redundant.
Fixes: c1e18c17bda6 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()")
Cc: stable@vger.kernnel.org
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Introduce two new AP bus related tracepoint events:
- There is a tracepoint s390_ap_nqap event immediately after a request
has been pushed into the AP firmware queue with the NQAP AP command.
- The other tracepoint s390_ap_dqap event fires immediately after a
reply has been pulled out of the AP firmware queue via DQAP AP
command.
Both events are triggered unconditional and may need filtering.
Filtering can be done based on the status value which is part of
the nqap and dqap trace. So for example a
echo "!(status & 0x00ff0000)" >.../s390_ap_dqap/filter
filters out all trace events which have a response_code != 0
leaving just the successful nqap and dqap invocations.
The idea of these two trace events focuses on performance to measure
the runtime of a crypto request/reply as close as possible at the
firmware level. In combination with the two zcrypt tracepoints (see
the zcrypt.h trace event definition file) this gives measurement data
about the runtime of a request/reply within the zcrpyt and AP bus
layer. However, with having the status of these AP commands in hand
also other usage may be possible.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Sometimes there is a different view of the AP status word
needed. So here is slight rework of the struct ap_queue_status
to open up the possibility to have different ways of accessing
the AP status bits and fields.
The new struct ap_queue_status
struct ap_queue_status {
union {
unsigned int value : 32;
struct {
unsigned int status_bits : 8;
unsigned int rc : 8;
unsigned int : 16;
};
struct {
unsigned int queue_empty : 1;
unsigned int replies_waiting : 1;
unsigned int queue_full : 1;
unsigned int : 3;
unsigned int async : 1;
unsigned int irq_enabled : 1;
unsigned int response_code : 8;
unsigned int : 16;
};
};
};
comprises the old struct ap_queue_status but extends it
to have this also accessible as an unsigned int required
for example for a simple print or trace of the whole value.
Note that this rework is fully backward compatible to the
existing code exploiting the struct ap_queue_status.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This is a slight rework of the s390_zcrypt_req and s390_zcrypt_rep
trace event:
- the psmid has been added to the s390_zcrypt_rep
- "dev" renamed to "card"
- "domain" renamed to "dom"
The motivation of these changes is to make these traces more
aligned to new upcoming traces for AP bus related trace events.
Additionally the psmid is needed to match the reply (and thus
indirect the request) to AP bus related trace events where only
the psmid is unique identifying AP messages.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The tape device driver uses a single idal_buffer for I/O. While the
buffer itself can be arbitrary big, the limit for data transfer for a
single Channel-Command Word is at 65535 bytes (64K-1) since the count
field specifying the amount of data designated by the CCW is a 16-bit
unsigned value.
Provide functionality that allocates an array of multiple IDAL buffer
with the limitation mentioned above in mind.
A call to idal_buffer_array_alloc() allocates an array with a certain
amount of IDAL buffers which is determined based on the total size of
@size. Each individual buffer is limited to a size of CCW_MAX_BYTE_COUNT
(65535 bytes).
Add helper functions that determine the size (# of elements) and the
total data size covered by the array as well.
Current users of the single IDAL buffer are adapted to use the new
functions with one buffer to allocate.
The single IDAL buffer is removed from the tape_char_data struct.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Since we are no longer switching from a BSCA to a ESCA we can completely
get rid of the sca_lock. The write lock was only taken for that
conversion.
After removal of the lock some local code cleanups are possible.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Suggested-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: Added suggested-by tag as discussed on list]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
All modern IBM Z and Linux One machines do offer support for the
Extended System Control Area (ESCA). The ESCA is available since the
z114/z196 released in 2010.
KVM needs to allocate and manage the SCA for guest VMs. Prior to this
change the SCA was setup as Basic SCA only supporting a maximum of 64
vCPUs when initializing the VM. With addition of the 65th vCPU the SCA
was needed to be converted to a ESCA.
Instead of allocating a BSCA and upgrading it for PV or when adding the
65th cpu we can always allocate the ESCA directly upon VM creation
simplifying the code in multiple places as well as completely removing
the need to convert an existing SCA.
In cases where the ESCA is not supported (z10 and earlier) the use of
the SCA entries and with that SIGP interpretation are disabled for VMs.
This increases the number of exits from the VM in multiprocessor
scenarios and thus decreases performance.
The same is true for VSIE where SIGP is currently disabled and thus no
SCA entries are used.
The only downside of the change is that we will always allocate 4 pages
for a 248 cpu ESCA instead of a single page for the BSCA per VM.
In return we can delete a bunch of checks and special handling depending
on the SCA type as well as the whole BSCA to ESCA conversion.
With that behavior change we are no longer referencing a bsca_block in
kvm->arch.sca. This will always be esca_block instead.
By specifying the type of the sca as esca_block we can simplify access
to the sca and get rid of some helpers while making the code clearer.
KVM_MAX_VCPUS is also moved to kvm_host_types to allow using this in
future type definitions.
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
The s390 indirect thunks are placed in the .text.__s390_indirect_jump_*
sections.
Certain config options which enable -ffunction-sections have a custom
version of the TEXT_TEXT macro:
.text.[0-9a-zA-Z_]*
That unintentionally matches the thunk sections, causing them to get
grouped with normal text rather than being handled by their intended
rule later in the script:
*(.text.*_indirect_*)
Fix that by adding another period to the thunk section names, following
the kernel's general convention for distinguishing code-generated text
sections from compiler-generated ones.
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- Compile the decompressor with -Wno-pointer-sign flag to avoid a clang
warning
- Fix incomplete conversion to flag output macros in __xsch(), to avoid
always zero return value instead of the expected condition code
- Remove superfluous newlines from inline assemblies to improve
compiler inlining decisions
- Expose firmware provided UID Checking state in sysfs regardless of
the device presence or state
- CIO does not unregister subchannels when the attached device is
invalid or unavailable. Update the purge function to remove I/O
subchannels if the device number is found on cio_ignore list
- Consolidate PAI crypto allocation and cleanup paths
- The uv_get_secret_metadata() function has been removed some few
months ago, remove also the function mention it in a comment
* tag 's390-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uv: Fix comment of uv_find_secret() function
s390/pai_crypto: Consolidate PAI crypto allocation and cleanup paths
s390/cio: Update purge function to unregister the unused subchannels
s390/pci: Expose firmware provided UID Checking state in sysfs
s390: Remove superfluous newlines from inline assemblies
s390/cio/ioasm: Fix __xsch() condition code handling
s390: Add -Wno-pointer-sign to KBUILD_CFLAGS_DECOMPRESSOR
|
|
Pull x86 kvm updates from Paolo Bonzini:
"Generic:
- Rework almost all of KVM's exports to expose symbols only to KVM's
x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko
x86:
- Rework almost all of KVM x86's exports to expose symbols only to
KVM's vendor modules, i.e. to kvm-{amd,intel}.ko
- Add support for virtualizing Control-flow Enforcement Technology
(CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
(Shadow Stacks).
It is worth noting that while SHSTK and IBT can be enabled
separately in CPUID, it is not really possible to virtualize them
separately. Therefore, Intel processors will really allow both
SHSTK and IBT under the hood if either is made visible in the
guest's CPUID. The alternative would be to intercept
XSAVES/XRSTORS, which is not feasible for performance reasons
- Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
when completing userspace I/O. KVM has already committed to
allowing L2 to to perform I/O at that point
- Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
MSR is supposed to exist for v2 PMUs
- Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs
- Add support for the immediate forms of RDMSR and WRMSRNS, sans full
emulator support (KVM should never need to emulate the MSRs outside
of forced emulation and other contrived testing scenarios)
- Clean up the MSR APIs in preparation for CET and FRED
virtualization, as well as mediated vPMU support
- Clean up a pile of PMU code in anticipation of adding support for
mediated vPMUs
- Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
vmexits needed to faithfully emulate an I/O APIC for such guests
- Many cleanups and minor fixes
- Recover possible NX huge pages within the TDP MMU under read lock
to reduce guest jitter when restoring NX huge pages
- Return -EAGAIN during prefault if userspace concurrently
deletes/moves the relevant memslot, to fix an issue where
prefaulting could deadlock with the memslot update
x86 (AMD):
- Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
supported
- Require a minimum GHCB version of 2 when starting SEV-SNP guests
via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
errors instead of latent guest failures
- Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
prevents unauthorized CPU accesses from reading the ciphertext of
SNP guest private memory, e.g. to attempt an offline attack. This
feature splits the shared SEV-ES/SEV-SNP ASID space into separate
ranges for SEV-ES and SEV-SNP guests, therefore a new module
parameter is needed to control the number of ASIDs that can be used
for VMs with CipherText Hiding vs. how many can be used to run
SEV-ES guests
- Add support for Secure TSC for SEV-SNP guests, which prevents the
untrusted host from tampering with the guest's TSC frequency, while
still allowing the the VMM to configure the guest's TSC frequency
prior to launch
- Validate the XCR0 provided by the guest (via the GHCB) to avoid
bugs resulting from bogus XCR0 values
- Save an SEV guest's policy if and only if LAUNCH_START fully
succeeds to avoid leaving behind stale state (thankfully not
consumed in KVM)
- Explicitly reject non-positive effective lengths during SNP's
LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
them
- Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
host's desired TSC_AUX, to fix a bug where KVM was keeping a
different vCPU's TSC_AUX in the host MSR until return to userspace
KVM (Intel):
- Preparation for FRED support
- Don't retry in TDX's anti-zero-step mitigation if the target
memslot is invalid, i.e. is being deleted or moved, to fix a
deadlock scenario similar to the aforementioned prefaulting case
- Misc bugfixes and minor cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: x86: Export KVM-internal symbols for sub-modules only
KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
KVM: Export KVM-internal symbols for sub-modules only
KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
KVM: VMX: Make CR4.CET a guest owned bit
KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
KVM: selftests: Add coverage for KVM-defined registers in MSRs test
KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
KVM: selftests: Extend MSRs test to validate vCPUs without supported features
KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
KVM: selftests: Add an MSR test to exercise guest/host and read/write
KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
KVM: x86: Define Control Protection Exception (#CP) vector
KVM: x86: Add human friendly formatting for #XM, and #VE
KVM: SVM: Enable shadow stack virtualization for SVM
KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
KVM: SVM: Pass through shadow stack MSRs as appropriate
KVM: SVM: Update dump_vmcb with shadow stack save area additions
KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
...
|
|
The sysfs file /sys/bus/pci/devices/<device_id>/uid_is_unique provides
the UID Checking state as a per device attribute, highlighting its
effect of providing the guarantee that a device's UID is unique.
As a device attribute, this parameter is however unavailable if no
device is configured.
Expose the UID Checking state as:
- A platform-level parameter
- Available regardless of device presence or state
Signed-off-by: Ramesh Errabolu <ramesh@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Pull kvm updates from Paolo Bonzini:
"This excludes the bulk of the x86 changes, which I will send
separately. They have two not complex but relatively unusual conflicts
so I will wait for other dust to settle.
guest_memfd:
- Add support for host userspace mapping of guest_memfd-backed memory
for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
(which isn't precisely the same thing as CoCo VMs, since x86's
SEV-MEM and SEV-ES have no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the
kernel direct map, as well as for limited mmap() for
guest_memfd-backed memory.
For more information see:
- commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD")
- guest_memfd in Firecracker:
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
- direct map removal:
https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
- mmap support:
https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
ARM:
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This
will hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then
allows a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
- Detect page table walk feature on new hardware
- Add sign extension with kernel MMIO/IOCSR emulation
- Improve in-kernel IPI emulation
- Improve in-kernel PCH-PIC emulation
- Move kvm_iocsr tracepoint out of generic code
RISC-V:
- Added SBI FWFT extension for Guest/VM with misaligned delegation
and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V
- Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
- Improve interrupt cpu for wakeup, in particular the heuristic to
decide which vCPU to deliver a floating interrupt to.
- Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
- Add #DE coverage in the fastops test (the only exception that's
guest- triggerable in fastop-emulated instructions).
- Fix PMU selftests errors encountered on Granite Rapids (GNR),
Sierra Forest (SRF) and Clearwater Forest (CWF).
- Minor cleanups and improvements
x86 (guest side):
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI
auto-mapping could map devices as WB and prevent the device drivers
from mapping their devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated
vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
Generic:
- Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
__GFP_NOWARN is now included in GFP_NOWAIT.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
KVM: s390: Fix to clear PTE when discarding a swapped page
KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
KVM: arm64: selftests: Add basic test for running in VHE EL2
KVM: arm64: selftests: Enable EL2 by default
KVM: arm64: selftests: Initialize HCR_EL2
KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
KVM: arm64: selftests: Select SMCCC conduit based on current EL
KVM: arm64: selftests: Provide helper for getting default vCPU target
KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
KVM: arm64: selftests: Add helper to check for VGICv3 support
KVM: arm64: selftests: Initialize VGICv3 only once
KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
KVM: selftests: Add ex_str() to print human friendly name of exception vectors
selftests/kvm: remove stale TODO in xapic_state_test
KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves
performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool
permits Rust allocators to set NUMA node and large alignment when
perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
DAMOS_STAT's handling of the DAMON operations sets for virtual
address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
Baghdasaryan reduces mmap_lock contention during reads of
/proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song
performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides
code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides
a block layer speedup by optionalls making the
huge_zero_pagepersistent, instead of releasing it when its refcount
falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
Stoakes turns mm_struct.flags into a bitmap. To end the constant
struggle with space shortage on 32-bit conflicting with 64-bit's
needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
code
- "selftests/mm: Fix false positives and skip unsupported tests" from
Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
from David Hildenbrand "allows individual processes to opt-out of
THP=always into THP=madvise, without affecting other workloads on the
system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and
https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling
improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our
folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap
selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that
function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces
the concept of "kernel file pages". Using these permits btrfs to
account its metadata pages to the root cgroup, rather than to the
cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from
Wei Yang provides some readability improvements to the page allocator
code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from
Brendan Jackman performs some code cleanups and deduplication under
tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific
implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an
indirection layer which now only redirects to a single thing
(zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
adjustments at various nth_page() callsites, eventually permitting
the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
creates a KASAN read-only mode for ARM, using that architecture's
memory tagging feature. It is felt that a read-only mode KASAN is
suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max
Kellermann makes quite a number of the MM API functions more accurate
about the constness of their arguments. This was getting in the way
of subsystems (in this case CEPH) when they attempt to improving
their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
code sites which were confused over when to use free_pages() vs
__free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
mapletree code accessible to Rust. Required by nouveau and by its
forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements" from David Hildenbrand adds a fix and some cleanups to
the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris
Li and Kairui Song is the first step along the path to implementing
"swap tables" - a new approach to swap allocation and state tracking
which is expected to yield speed and space improvements. This
patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
addresses a couple of minor issues in relatively new memory
allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in
preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
Quanmin Yan makes some changes to DAMON in furtherance of supporting
arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
Anjum adds that compiler check to selftests code and fixes the
fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal
Order" from zhongjinji makes a number of improvements in the OOM
killer: mainly thawing a more appropriate group of victim threads so
they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from
SeongJae Park implement reliability and maintainability improvements
to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
SeongJae Park provides additional transparency to userspace clients
of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
some constraints on khubepaged's collapsing of anon VMAs. It also
increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
from Lorenzo Stoakes moves us further towards removal of
file_operations.mmap(). This patchset concentrates upon clearing up
the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
provides some fixes and improvements to mlock's tracking of large
folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
Donet Tom fixes several user-visible KSM stats inaccuracies across
forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
mm: swap: check for stable address space before operating on the VMA
mm: convert folio_page() back to a macro
mm/khugepaged: use start_addr/addr for improved readability
hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
alloc_tag: fix boot failure due to NULL pointer dereference
mm: silence data-race in update_hiwater_rss
mm/memory-failure: don't select MEMORY_ISOLATION
mm/khugepaged: remove definition of struct khugepaged_mm_slot
mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
hugetlb: increase number of reserving hugepages via cmdline
selftests/mm: add fork inheritance test for ksm_merging_pages counter
mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
drivers/base/node: fix double free in register_one_node()
mm: remove PMD alignment constraint in execmem_vmalloc()
mm/memory_hotplug: fix typo 'esecially' -> 'especially'
mm/rmap: improve mlock tracking for large folios
mm/filemap: map entire large folio faultaround
mm/fault: try to map the entire file folio in finish_fault()
mm/rmap: mlock large folios in try_to_unmap_one()
mm/rmap: fix a mlock race condition in folio_referenced_one()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull TIF bit unification updates from Thomas Gleixner:
"A set of changes to consolidate the generic TIF (thread info flag)
bits accross architectures.
All architectures define the same set of generic TIF bits. This makes
it pointlessly hard to add a new generic TIF bit or to change an
existing one.
Provide a generic variant and convert the architectures which utilize
the generic entry code over to use it. The TIF space is divided into
16 generic bits and 16 architecture specific bits, which turned out to
provide enough space on both sides"
* tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
LoongArch: Fix bitflag conflict for TIF_FIXADE
riscv: Use generic TIF bits
loongarch: Use generic TIF bits
s390/entry: Remove unused TIF flags
s390: Use generic TIF bits
x86: Use generic TIF bits
asm-generic: Provide generic TIF infrastructure
|
|
Use kvm_is_gpa_in_memslot() to check the validity of the notification
indicator byte address instead of open coding equivalent logic in the VFIO
AP driver.
Opportunistically use a dedicated wrapper that exists and is exported
expressly for the VFIO AP module. kvm_is_gpa_in_memslot() is generally
unsuitable for use outside of KVM; other drivers typically shouldn't rely
on KVM's memslots, and using the API requires kvm->srcu (or slots_lock) to
be held for the entire duration of the usage, e.g. to avoid TOCTOU bugs.
handle_pqap() is a bit of a special case, as it's explicitly invoked from
KVM with kvm->srcu already held, and the VFIO AP driver is in many ways an
extension of KVM that happens to live in a separate module.
Providing a dedicated API for the VFIO AP driver will allow restricting
the vast majority of generic KVM's exports to KVM submodules (e.g. to x86's
kvm-{amd,intel}.ko vendor mdoules).
No functional change intended.
Acked-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20250919003303.1355064-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.18
1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: A bugfix and a performance improvement
* Improve interrupt cpu for wakeup, change the heuristic to decide wich
vCPU to deliver a floating interrupt to.
* Clear the pte when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
|
|
KVM run fails when guests with 'cmm' cpu feature and host are
under memory pressure and use swap heavily. This is because
npages becomes ENOMEN (out of memory) in hva_to_pfn_slow()
which inturn propagates as EFAULT to qemu. Clearing the page
table entry when discarding an address that maps to a swap
entry resolves the issue.
Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers")
Cc: stable@vger.kernel.org
Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Gautam Gala <ggala@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Refactor SCLP memory hotplug code
- Introduce common boot_panic() decompressor helper macro and use it to
get rid of nearly few identical implementations
- Take into account additional key generation flags and forward it to
the ep11 implementation. With that allow users to modify the key
generation process, e.g. provide valid combinations of XCP_BLOB_*
flags
- Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
debug facility and HMC driver
- Add DAX support for DCSS memory block devices
- Make the compiler statement attribute "assume" available with a new
__assume macro
- Rework ffs() and fls() family bitops functions, including source code
improvements and generated code optimizations. Use the newly
introduced __assume macro for that
- Enable additional network features in default configurations
- Use __GFP_ACCOUNT flag for user page table allocations to add missing
kmemcg accounting
- Add WQ_PERCPU flag to explicitly request the use of the per-CPU
workqueue for 3590 tape driver
- Switch power reading to the per-CPU and the Hiperdispatch to the
default workqueue
- Add memory allocation profiling hooks to allow better profiling data
and the /proc/allocinfo output similar to other architectures
* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/mm: Add memory allocation profiling hooks
s390: Replace use of system_wq with system_dfl_wq
s390/diag324: Replace use of system_wq with system_percpu_wq
s390/tape: Add WQ_PERCPU to alloc_workqueue users
s390/bitops: Switch to generic ffs() if supported by compiler
s390/bitops: Switch to generic fls(), fls64(), etc.
s390/mm: Use __GFP_ACCOUNT for user page table allocations
s390/configs: Enable additional network features
s390/bitops: Cleanup __flogr()
s390/bitops: Use __assume() for __flogr() inline assembly return value
compiler_types: Add __assume macro
s390/bitops: Limit return value range of __flogr()
s390/dcssblk: Add DAX support
s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/pkey: Forward keygenflags to ep11_unwrapkey
s390/boot: Add common boot_panic() code
s390/bitops: Optimize inlining
s390/bitops: Slightly optimize ffs() and fls64()
s390/sclp: Move memory hotplug code for better modularity
...
|