summaryrefslogtreecommitdiff
path: root/arch/riscv/kvm
AgeCommit message (Collapse)Author
9 daysMerge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.19 - SBI MPXY support for KVM guest - New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores
2025-11-26Merge tag 'kvm-x86-tdx-6.19' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM TDX changes for 6.19: - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module, which KVM was either working around in weird, ugly ways, or was simply oblivious to (as proven by Yan tripping several KVM_BUG_ON()s with clever selftests). - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a vCPU if creating said vCPU failed partway through. - Fix a few sparse warnings (bad annotation, 0 != NULL). - Use struct_size() to simplify copying capabilities to userspace.
2025-11-24RISC-V: KVM: Flush VS-stage TLB after VCPU migration for Andes coresHui Min Mina Chou
Most implementations cache the combined result of two-stage translation, but some, like Andes cores, use split TLBs that store VS-stage and G-stage entries separately. On such systems, when a VCPU migrates to another CPU, an additional HFENCE.VVMA is required to avoid using stale VS-stage entries, which could otherwise cause guest faults. Introduce a static key to identify CPUs with split two-stage TLBs. When enabled, KVM issues an extra HFENCE.VVMA on VCPU migration to prevent stale VS-stage mappings. Signed-off-by: Hui Min Mina Chou <minachou@andestech.com> Signed-off-by: Ben Zong-You Xie <ben717@andestech.com> Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20251117084555.157642-1-minachou@andestech.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24RISC-V: KVM: Fix guest page fault within HLV* instructionsFangyu Yu
When executing HLV* instructions at the HS mode, a guest page fault may occur when a g-stage page table migration between triggering the virtual instruction exception and executing the HLV* instruction. This may be a corner case, and one simpler way to handle this is to re-execute the instruction where the virtual instruction exception occurred, and the guest page fault will be automatically handled. Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources") Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251121133543.46822-1-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24KVM: riscv: Support enabling dirty log gradually in small chunksDong Yang
There is already support of enabling dirty log gradually in small chunks for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks") and c862626 ("KVM: arm64: Support enabling dirty log gradually in small chunks"). This adds support for riscv. x86 and arm64 writes protect both huge pages and normal pages now, so riscv protect also protects both huge pages and normal pages. On a nested virtualization setup (RISC-V KVM running inside a QEMU VM on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux VM using different backing page sizes. The time taken for memory_global_dirty_log_start in the L2 QEMU is listed below: Page Size Before After Optimization 4K 4490.23ms 31.94ms 2M 48.97ms 45.46ms 1G 28.40ms 30.93ms Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Signed-off-by: Dong Yang <dayss1224@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24RISC-V: KVM: Introduce KVM_EXIT_FAIL_ENTRY_NO_VSFILEBillXiang
Currently, we return CSR_HSTATUS as hardware_entry_failure_reason when kvm_riscv_aia_alloc_hgei failed in KVM_DEV_RISCV_AIA_MODE_HWACCEL mode, which is vague so it is better to return a well defined value KVM_EXIT_FAIL_ENTRY_NO_VSFILE provided via uapi/asm/kvm.h. Signed-off-by: BillXiang <xiangwencheng@lanxincomputing.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250923053851.32863-1-xiangwencheng@lanxincomputing.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24RISC-V: KVM: Add SBI MPXY extension support for GuestAnup Patel
The SBI MPXY extension is a platform-level functionality so KVM only needs to forward SBI MPXY calls to KVM user-space. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251017155925.361560-4-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24RISC-V: KVM: Add separate source for forwarded SBI extensionsAnup Patel
Add a separate source vcpu_sbi_forward.c for SBI extensions which are entirely forwarded to KVM user-space. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251017155925.361560-3-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-24RISC-V: KVM: Convert kvm_riscv_vcpu_sbi_forward() into extension handlerAnup Patel
All uses of kvm_riscv_vcpu_sbi_forward() also updates retdata->uexit so to further reduce code duplication move retdata->uexit assignment to kvm_riscv_vcpu_sbi_forward() and convert it into SBI extension handler. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251017155925.361560-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-11-05KVM: Rename kvm_arch_vcpu_async_ioctl() to kvm_arch_vcpu_unlocked_ioctl()Sean Christopherson
Rename the "async" ioctl API to "unlocked" so that upcoming usage in x86's TDX code doesn't result in a massive misnomer. To avoid having to retry SEAMCALLs, TDX needs to acquire kvm->lock *and* all vcpu->mutex locks, and acquiring all of those locks after/inside the current vCPU's mutex is a non-starter. However, TDX also needs to acquire the vCPU's mutex and load the vCPU, i.e. the handling is very much not async to the vCPU. No functional change intended. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-05KVM: Make support for kvm_arch_vcpu_async_ioctl() mandatorySean Christopherson
Implement kvm_arch_vcpu_async_ioctl() "natively" in x86 and arm64 instead of relying on an #ifdef'd stub, and drop HAVE_KVM_VCPU_ASYNC_IOCTL in anticipation of using the API on x86. Once x86 uses the API, providing a stub for one architecture and having all other architectures opt-in requires more code than simply implementing the API in the lone holdout. Eliminating the Kconfig will also reduce churn if the API is renamed in the future (spoiler alert). No functional change intended. Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251030200951.3402865-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-24RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAPFangyu Yu
As of commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()"), vm_pgoff may no longer guaranteed to hold the PFN for VM_PFNMAP regions. Using vma->vm_pgoff to derive the HPA here may therefore produce incorrect mappings. Instead, I/O mappings for such regions can be established on-demand during g-stage page faults, making the upfront ioremap in this path is unnecessary. Fixes: aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()") Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Tested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251021142131.78796-1-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-10-17RISC-V: KVM: Read HGEIP CSR on the correct cpuFangyu Yu
When executing kvm_riscv_vcpu_aia_has_interrupts, the vCPU may have migrated and the IMSIC VS-file have not been updated yet, currently the HGEIP CSR should be read from the imsic->vsfile_cpu ( the pCPU before migration ) via on_each_cpu_mask, but this will trigger an IPI call and repeated IPI within a period of time is expensive in a many-core systems. Just let the vCPU execute and update the correct IMSIC VS-file via kvm_riscv_vcpu_aia_imsic_update may be a simple solution. Fixes: 4cec89db80ba ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization") Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251016012659.82998-1-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-10-16RISC-V: KVM: Fix check for local interrupts on riscv32Samuel Holland
To set all 64 bits in the mask on a 32-bit system, the constant must have type `unsigned long long`. Fixes: 6b1e8ba4bac4 ("RISC-V: KVM: Use bitmap for irqs_pending and irqs_pending_mask") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20251016001714.3889380-1-samuel.holland@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-10-07Merge tag 'hyperv-next-signed-20251006' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-09-30entry: Rename "kvm" entry code assets to "virt" to genericize APIsSean Christopherson
Rename the "kvm" entry code files and Kconfigs to use generic "virt" nomenclature so that the code can be reused by other hypervisors (or rather, their root/dom0 partition drivers), without incorrectly suggesting the code somehow relies on and/or involves KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-30entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM properSean Christopherson
Move KVM's morphing of pending signals into userspace exits into KVM proper, and drop the @vcpu param from xfer_to_guest_mode_handle_work(). How KVM responds to -EINTR is a detail that really belongs in KVM itself, and invoking kvm_handle_signal_exit() from kernel code creates an inverted module dependency. E.g. attempting to move kvm_handle_signal_exit() into kvm_main.c would generate an linker error when building kvm.ko as a module. Dropping KVM details will also converting the KVM "entry" code into a more generic virtualization framework so that it can be used when running as a Hyper-V root partition. Lastly, eliminating usage of "struct kvm_vcpu" outside of KVM is also nice to have for KVM x86 developers, as keeping the details of kvm_vcpu purely within KVM allows changing the layout of the structure without having to boot into a new kernel, e.g. allows rebuilding and reloading kvm.ko with a modified kvm_vcpu structure as part of debug/development. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-29Merge tag 'riscv-for-linus-6.18-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other architectures have already merged this type of cleanup) - The introduction of ioremap_wc() for RISC-V - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than open code - A RISC-V kprobes unit test - An architecture-specific endianness swap macro set implementation, leveraging some dedicated RISC-V instructions for this purpose if they are available - The ability to identity and communicate to userspace the presence of a MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE implementation in cpu_relax() - Several other miscellaneous cleanups * tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits) riscv: errata: Fix the PAUSE Opcode for MIPS P8700 riscv: hwprobe: Document MIPS xmipsexectl vendor extension riscv: hwprobe: Add MIPS vendor extension probing riscv: Add xmipsexectl instructions riscv: Add xmipsexectl as a vendor extension dt-bindings: riscv: Add xmipsexectl ISA extension description riscv: cpufeature: add validation for zfa, zfh and zfhmin perf: riscv: skip empty batches in counter start selftests: riscv: Add README for RISC-V KSelfTest riscv: sbi: Switch to new sys-off handler API riscv: Move vendor errata definitions to new header RISC-V: ACPI: enable parsing the BGRT table riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG riscv: pi: use 'targets' instead of extra-y in Makefile riscv: introduce asm/swab.h riscv: mmap(): use unsigned offset type in riscv_sys_mmap drivers/perf: riscv: Remove redundant ternary operators riscv: mm: Use mmu-type from FDT to limit SATP mode riscv: mm: Return intended SATP mode for noXlvl options riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM ...
2025-09-16riscv: Move all duplicate insn parsing macros into asm/insn.hAlexandre Ghiti
kernel/traps_misaligned.c and kvm/vcpu_insn.c define the same macros to extract information from the instructions. Let's move the definitions into asm/insn.h to avoid this duplication. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-3-d865dc9ad180@rivosinc.com [pjw@kernel.org: updated to apply] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16riscv: Strengthen duplicate and inconsistent definition of RV_X()Alexandre Ghiti
RV_X() macro is defined in two different ways which is error prone. So harmonize its first definition and add another macro RV_X_MASK() for the second one. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250620-dev-alex-insn_duplicate_v5_manual-v5-2-d865dc9ad180@rivosinc.com [pjw@kernel.org: upcase the macro name to conform with previous practice] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16RISC-V: KVM: Implement get event info functionAtish Patra
The new get_event_info funciton allows the guest to query the presence of multiple events with single SBI call. Currently, the perf driver in linux guest invokes it for all the standard SBI PMU events. Support the SBI function implementation in KVM as well. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-7-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: No need of explicit writable slot checkAtish Patra
There is not much value in checking if a memslot is writable explicitly before a write as it may change underneath after the check. Rather, return invalid address error when write_guest fails as it checks if the slot is writable anyways. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-6-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Add support for Raw event v2Atish Patra
SBI v3.0 introduced a new raw event type v2 for wider mhpmeventX programming. Add the support in kvm for that. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Acked-by: Paul Walmsley <pjw@kernel.org> Link: https://lore.kernel.org/r/20250909-pmu_event_info-v6-3-d8f80cacb884@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Implement ONE_REG interface for SBI FWFT stateAnup Patel
The KVM user-space needs a way to save/restore the state of SBI FWFT features so implement SBI extension ONE_REG callbacks. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250823155947.1354229-6-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementationAnup Patel
The ONE_REG handling of SBI extension enable/disable registers and SBI extension state registers is already under SBI implementation. On similar lines, let's move copy_sbi_ext_reg_indices() under SBI implementation. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250823155947.1354229-5-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Introduce optional ONE_REG callbacks for SBI extensionsAnup Patel
SBI extensions can have per-VCPU state which needs to be saved/restored through ONE_REG interface for Guest/VM migration. Introduce optional ONE_REG callbacks for SBI extensions so that ONE_REG implementation for an SBI extenion is part of the extension sources. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250823155947.1354229-4-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Introduce feature specific reset for SBI FWFTAnup Patel
The SBI FWFT feature values must be reset upon VCPU reset so introduce feature specific reset callback for this purpose. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20250823155947.1354229-3-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Set initial value of hedeleg in kvm_arch_vcpu_create()Anup Patel
The hedeleg may be updated by ONE_REG interface before the VCPU is run at least once hence set the initial value of hedeleg in kvm_arch_vcpu_create() instead of kvm_riscv_vcpu_setup_config(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20250823155947.1354229-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Prevent HGATP_MODE_BARE passedGuo Ren (Alibaba DAMO Academy)
Current kvm_riscv_gstage_mode_detect() assumes H-extension must have HGATP_MODE_SV39X4/SV32X4 at least, but the spec allows H-extension with HGATP_MODE_BARE alone. The KVM depends on !HGATP_MODE_BARE at least, so enhance the gstage-mode-detect to block HGATP_MODE_BARE. Move gstage-mode-check closer to gstage-mode-detect to prevent unnecessary init. Reviewed-by: Troy Mitchell <troy.mitchell@linux.dev> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Reviewed-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Link: https://lore.kernel.org/r/20250821142542.2472079-4-guoren@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Remove unnecessary HGATP csr_readGuo Ren (Alibaba DAMO Academy)
The HGATP has been set to zero in gstage_mode_detect(), so there is no need to save the old context. Unify the code convention with gstage_mode_detect(). Reviewed-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20250821142542.2472079-3-guoren@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Write hgatp register with valid mode bitsFangyu Yu
According to the RISC-V Privileged Architecture Spec, when MODE=Bare is selected,software must write zero to the remaining fields of hgatp. We have detected the valid mode supported by the HW before, So using a valid mode to detect how many vmid bits are supported. Fixes: fd7bb4a251df ("RISC-V: KVM: Implement VMID allocator") Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Link: https://lore.kernel.org/r/20250821142542.2472079-2-guoren@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Allow bfloat16 extension for Guest/VMQuan Zhou
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zfbfmin/Zvfbfmin/Zvfbfwma extension for Guest/VM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/f846cecd330ab9fc88211c55bc73126f903f8713.1754646071.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Allow Zicbop extension for Guest/VMQuan Zhou
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zicbop extension for Guest/VM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/db4a9b679cc653bb6f5f5574e4196de7a980e458.1754646071.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Provide UAPI for Zicbop block sizeQuan Zhou
We're about to allow guests to use the Zicbop extension. KVM userspace needs to know the cache block size in order to properly advertise it to the guest. Provide a virtual config register for userspace to get it with the GET_ONE_REG API, but setting it cannot be supported, so disallow SET_ONE_REG. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/befd8403cd76d7adb97231ac993eaeb86bf2582c.1754646071.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Change zicbom/zicboz block size to depend on the host isaQuan Zhou
The zicbom/zicboz block size registers should depend on the host's isa, the reason is that we otherwise create an ioctl order dependency on the VMM. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviwed-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/fef5907425455ecd41b224e0093f1b6bc4067138.1754646071.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16RISC-V: KVM: Add support for SBI_FWFT_POINTER_MASKING_PMLENSamuel Holland
Pointer masking is controlled through a WARL field in henvcfg. Expose the feature only if at least one PMLEN value is supported for VS-mode. Allow the VMM to block access to the feature by disabling the Smnpm ISA extension in the guest. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250111004702.2813013-3-samuel.holland@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-08RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEGClément Léger
SBI_FWFT_MISALIGNED_DELEG needs hedeleg to be modified to delegate misaligned load/store exceptions. Save and restore it during CPU load/put. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-15-cleger@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-08RISC-V: KVM: add support for FWFT SBI extensionClément Léger
Add basic infrastructure to support the FWFT extension in KVM. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-14-cleger@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-08-25RISC-V: KVM: fix stack overrun when loading vlenbRadim Krčmář
The userspace load can put up to 2048 bits into an xlen bit stack buffer. We want only xlen bits, so check the size beforehand. Fixes: 2fa290372dfe ("RISC-V: KVM: add 'vlenb' Vector CSR") Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Link: https://lore.kernel.org/r/20250805104418.196023-4-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-08-25RISC-V: KVM: Correct kvm_riscv_check_vcpu_requests() commentQuan Zhou
Correct `check_vcpu_requests` to `kvm_riscv_check_vcpu_requests` in comments. Fixes: f55ffaf89636 ("RISC-V: KVM: Enable ring-based dirty memory tracking") Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/49680363098c45516ec4b305283d662d26fa9386.1754326285.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-08-25RISC-V: KVM: Fix pte settings within kvm_riscv_gstage_ioremap()Fangyu Yu
Currently, kvm_riscv_gstage_ioremap() is used to map IMSIC gpa to the spa of IMSIC guest interrupt file. The PAGE_KERNEL_IO property includes global setting whereas it does not include user mode settings, so when accessing the IMSIC address in the virtual machine, a guest page fault will occur, this is not expected. According to the RISC-V Privileged Architecture Spec, for G-stage address translation, all memory accesses are considered to be user-level accesses as though executed in U-mode. Fixes: 659ad6d82c31 ("RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap()") Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20250807070729.89701-1-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()Quan Zhou
The caller has already passed in the memslot, and there are two instances `{kvm_faultin_pfn/mark_page_dirty}` of retrieving the memslot again in `kvm_riscv_gstage_map`, we can replace them with `{__kvm_faultin_pfn/mark_page_dirty_in_slot}`. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/50989f0a02790f9d7dc804c2ade6387c4e7fbdbc.1749634392.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAsQuan Zhou
There is already a helper function find_vma_intersection() in KVM for searching intersecting VMAs, use it directly. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/230d6c8c8b8dd83081fcfd8d83a4d17c8245fa2f.1731552790.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Enable ring-based dirty memory trackingQuan Zhou
Enable ring-based dirty memory tracking on riscv: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as riscv is weakly ordered. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add a check to kvm_vcpu_kvm_riscv_check_vcpu_requests for checking whether the dirty ring is soft full. To handle vCPU requests that cause exits to userspace, modified the `kvm_riscv_check_vcpu_requests` to return a value (currently only returns 0 or 1). Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20e116efb1f7aff211dd8e3cf8990c5521ed5f34.1749810735.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmapSamuel Holland
The Smnpm extension requires special handling because the guest ISA extension maps to a different extension (Ssnpm) on the host side. commit 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests") missed that the vcpu->arch.isa bit is based only on the host extension, so currently both KVM_RISCV_ISA_EXT_{SMNPM,SSNPM} map to vcpu->arch.isa[RISCV_ISA_EXT_SSNPM]. This does not cause any problems for the guest, because both extensions are force-enabled anyway when the host supports Ssnpm, but prevents checking for (guest) Smnpm in the SBI FWFT logic. Redefine kvm_isa_ext_arr to look up the guest extension, since only the guest -> host mapping is unambiguous. Factor out the logic for checking for host support of an extension, so this special case only needs to be handled in one place, and be explicit about which variables hold a host vs a guest ISA extension. Fixes: 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250111004702.2813013-2-samuel.holland@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIsAnup Patel
Currently, all kvm_riscv_hfence_xyz() APIs assume VMID to be the host VMID of the Guest/VM which resticts use of these APIs only for host TLB maintenance. Let's allow passing VMID as a parameter to all kvm_riscv_hfence_xyz() APIs so that they can be re-used for nested virtualization related TLB maintenance. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-13-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Factor-out g-stage page table managementAnup Patel
The upcoming nested virtualization can share g-stage page table management with the current host g-stage implementation hence factor-out g-stage page table management as separate sources and also use "kvm_riscv_mmu_" prefix for host g-stage functions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-12-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Add vmid field to struct kvm_riscv_hfenceAnup Patel
Currently, the struct kvm_riscv_hfence does not have vmid field and various hfence processing functions always pick vmid assigned to the guest/VM. This prevents us from doing hfence operation on arbitrary vmid hence add vmid field to struct kvm_riscv_hfence and use it wherever applicable. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-11-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Introduce struct kvm_gstage_mappingAnup Patel
Introduce struct kvm_gstage_mapping which represents a g-stage mapping at a particular g-stage page table level. Also, update the kvm_riscv_gstage_map() to return the g-stage mapping upon success. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-10-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>