iommu/vt-d: Use the generic iommu page table

Replace the VT-d iommu_domain implementation of the VT-d second stage and first stage page tables with the iommupt VTDSS and x86_64 pagetables. x86_64 is shared with the AMD driver. There are a couple notable things in VT-d: - Like AMD the second stage format is not sign extended, unlike AMD it cannot decode a full 64 bits. The first stage format is a normal sign extended x86 page table - The HW caps can indicate how many levels, how many address bits and what leaf page sizes are supported in HW. As before the highest number of levels that can translate the entire supported address width is used. The supported page sizes are adjusted directly from the dedicated first/second stage cap bits. - VTD requires flushing 'write buffers'. This logic is left unchanged, the write buffer flushes on any gather flush or through iotlb_sync_map. - Like ARM, VTD has an optional non-coherent page table walker that requires cache flushing. This is supported through PT_FEAT_DMA_INCOHERENT the same as ARM, however x86 can't use the DMA API for flush, it must call the arch function clflush_cache_range() - The PT_FEAT_DYNAMIC_TOP can probably be supported on VT-d someday for the second stage when it uses 128 bit atomic stores for the HW context structures. - PT_FEAT_VTDSS_FORCE_WRITEABLE is used to work around ERRATA_772415_SPR17 - A kernel command line parameter "sp_off" disables all page sizes except 4k Remove all the unused iommu_domain page table code. The debugfs paths have their own independent page table walker that is left alone for now. This corrects a race with the non-coherent walker that the ARM implementations have fixed: CPU 0 CPU 1 pfn_to_dma_pte() pfn_to_dma_pte() pte = &parent[offset]; if (!dma_pte_present(pte)) { try_cmpxchg64(&pte->val) pte = &parent[offset]; .. dma_pte_present(pte) .. [...] // iommu_map() completes // Device does DMA domain_flush_cache(pte) The CPU 1 mapping operation shares a page table level with the CPU 0 mapping operation. CPU 0 installed a new page table level but has not flushed it yet. CPU1 returns from iommu_map() and the device does DMA. The non coherent walker fails to see the new table level installed by CPU 0 and fails the DMA with non-present. The iommupt PT_FEAT_DMA_INCOHERENT implementation uses the ARM design of storing a flag when CPU 0 completes the flush. If the flag is not set CPU 1 will also flush to ensure the HW can fully walk to the PTE being installed. Cc: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
author: Jason Gunthorpe <jgg@nvidia.com> 2025-10-23 15:22:36 -0300
committer: Joerg Roedel <joerg.roedel@amd.com> 2025-11-05 09:50:19 +0100
commit: d373449d8e97891434db0c64afca79d903c1194e (patch)
tree: 12442349b34bf117bbebcd390137eebaea31a17f /drivers/iommu/intel/nested.c
parent: ef7bfe5bbffdcfa033beeeb068c6317f71730679 (diff)
1 files changed, 0 insertions, 5 deletions
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 760d7aa2ade8..a3fb8c193ca6 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -29,11 +29,6 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
 
 	device_block_translation(dev);
 
-	if (iommu->agaw < dmar_domain->s2_domain->agaw) {
-		dev_err_ratelimited(dev, "Adjusted guest address width not compatible\n");
-		return -ENODEV;
-	}
-
 	/*
 	 * Stage-1 domain cannot work alone, it is nested on a s2_domain.
 	 * The s2_domain will be used in nested translation, hence needs
author	Jason Gunthorpe <jgg@nvidia.com>	2025-10-23 15:22:36 -0300
committer	Joerg Roedel <joerg.roedel@amd.com>	2025-11-05 09:50:19 +0100
commit	d373449d8e97891434db0c64afca79d903c1194e (patch)
tree	12442349b34bf117bbebcd390137eebaea31a17f /drivers/iommu/intel/nested.c
parent	ef7bfe5bbffdcfa033beeeb068c6317f71730679 (diff)