diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-10-04 08:52:16 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-10-04 08:52:16 -0700 |
| commit | f3826aa9962b4572d01083c84ac0f8345f121168 (patch) | |
| tree | a6639912ff01992b62b37c7c4fc49d612cec19f7 /arch/x86/kernel/kvm.c | |
| parent | bf897d2626abe4559953342e2f7dda05d034c8c7 (diff) | |
| parent | 99cab80208809cb918d6e579e6165279096f058a (diff) | |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This excludes the bulk of the x86 changes, which I will send
separately. They have two not complex but relatively unusual conflicts
so I will wait for other dust to settle.
guest_memfd:
- Add support for host userspace mapping of guest_memfd-backed memory
for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE
(which isn't precisely the same thing as CoCo VMs, since x86's
SEV-MEM and SEV-ES have no way to detect private vs. shared).
This lays the groundwork for removal of guest memory from the
kernel direct map, as well as for limited mmap() for
guest_memfd-backed memory.
For more information see:
- commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD")
- guest_memfd in Firecracker:
https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
- direct map removal:
https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/
- mmap support:
https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/
ARM:
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This
will hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then
allows a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
LoongArch:
- Detect page table walk feature on new hardware
- Add sign extension with kernel MMIO/IOCSR emulation
- Improve in-kernel IPI emulation
- Improve in-kernel PCH-PIC emulation
- Move kvm_iocsr tracepoint out of generic code
RISC-V:
- Added SBI FWFT extension for Guest/VM with misaligned delegation
and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V
- Added SBI v3.0 PMU enhancements in KVM and perf driver
s390:
- Improve interrupt cpu for wakeup, in particular the heuristic to
decide which vCPU to deliver a floating interrupt to.
- Clear the PTE when discarding a swapped page because of CMMA; this
bug was introduced in 6.16 when refactoring gmap code.
x86 selftests:
- Add #DE coverage in the fastops test (the only exception that's
guest- triggerable in fastop-emulated instructions).
- Fix PMU selftests errors encountered on Granite Rapids (GNR),
Sierra Forest (SRF) and Clearwater Forest (CWF).
- Minor cleanups and improvements
x86 (guest side):
- For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when
overriding guest MTRR for TDX/SNP to fix an issue where ACPI
auto-mapping could map devices as WB and prevent the device drivers
from mapping their devices with UC/UC-.
- Make kvm_async_pf_task_wake() a local static helper and remove its
export.
- Use native qspinlocks when running in a VM with dedicated
vCPU=>pCPU bindings even when PV_UNHALT is unsupported.
Generic:
- Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as
__GFP_NOWARN is now included in GFP_NOWAIT.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits)
KVM: s390: Fix to clear PTE when discarding a swapped page
KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs
KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs
KVM: arm64: selftests: Cope with arch silliness in EL2 selftest
KVM: arm64: selftests: Add basic test for running in VHE EL2
KVM: arm64: selftests: Enable EL2 by default
KVM: arm64: selftests: Initialize HCR_EL2
KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters
KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2
KVM: arm64: selftests: Select SMCCC conduit based on current EL
KVM: arm64: selftests: Provide helper for getting default vCPU target
KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts
KVM: arm64: selftests: Create a VGICv3 for 'default' VMs
KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation
KVM: arm64: selftests: Add helper to check for VGICv3 support
KVM: arm64: selftests: Initialize VGICv3 only once
KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code
KVM: selftests: Add ex_str() to print human friendly name of exception vectors
selftests/kvm: remove stale TODO in xapic_state_test
KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount
...
Diffstat (limited to 'arch/x86/kernel/kvm.c')
| -rw-r--r-- | arch/x86/kernel/kvm.c | 44 |
1 files changed, 30 insertions, 14 deletions
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8ae750cde0c6..b67d7c59dca0 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -190,7 +190,7 @@ static void apf_task_wake_all(void) } } -void kvm_async_pf_task_wake(u32 token) +static void kvm_async_pf_task_wake(u32 token) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -241,7 +241,6 @@ again: /* A dummy token might be allocated and ultimately not used. */ kfree(dummy); } -EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake); noinstr u32 kvm_read_and_reset_apf_flags(void) { @@ -933,6 +932,19 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc) static void __init kvm_init_platform(void) { + u64 tolud = PFN_PHYS(e820__end_of_low_ram_pfn()); + /* + * Note, hardware requires variable MTRR ranges to be power-of-2 sized + * and naturally aligned. But when forcing guest MTRR state, Linux + * doesn't program the forced ranges into hardware. Don't bother doing + * the math to generate a technically-legal range. + */ + struct mtrr_var_range pci_hole = { + .base_lo = tolud | X86_MEMTYPE_UC, + .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | MTRR_PHYSMASK_V, + .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32, + }; + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) { unsigned long nr_pages; @@ -982,8 +994,12 @@ static void __init kvm_init_platform(void) kvmclock_init(); x86_platform.apic_post_init = kvm_apic_init; - /* Set WB as the default cache mode for SEV-SNP and TDX */ - guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK); + /* + * Set WB as the default cache mode for SEV-SNP and TDX, with a single + * UC range for the legacy PCI hole, e.g. so that devices that expect + * to get UC/WC mappings don't get surprised with WB. + */ + guest_force_mtrr_state(&pci_hole, 1, MTRR_TYPE_WRBACK); } #if defined(CONFIG_AMD_MEM_ENCRYPT) @@ -1073,16 +1089,6 @@ static void kvm_wait(u8 *ptr, u8 val) void __init kvm_spinlock_init(void) { /* - * In case host doesn't support KVM_FEATURE_PV_UNHALT there is still an - * advantage of keeping virt_spin_lock_key enabled: virt_spin_lock() is - * preferred over native qspinlock when vCPU is preempted. - */ - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) { - pr_info("PV spinlocks disabled, no host support\n"); - return; - } - - /* * Disable PV spinlocks and use native qspinlock when dedicated pCPUs * are available. */ @@ -1101,6 +1107,16 @@ void __init kvm_spinlock_init(void) goto out; } + /* + * In case host doesn't support KVM_FEATURE_PV_UNHALT there is still an + * advantage of keeping virt_spin_lock_key enabled: virt_spin_lock() is + * preferred over native qspinlock when vCPU is preempted. + */ + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) { + pr_info("PV spinlocks disabled, no host support\n"); + return; + } + pr_info("PV spinlocks enabled\n"); __pv_init_lock_hash(); |