summaryrefslogtreecommitdiff
path: root/drivers/xen
AgeCommit message (Collapse)Author
3 daysMerge tag 'soc-drivers-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...
3 daysMerge tag 'pull-persistency' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
5 daysMerge tag 'net-next-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Replace busylock at the Tx queuing layer with a lockless list. Resulting in a 300% (4x) improvement on heavy TX workloads, sending twice the number of packets per second, for half the cpu cycles. - Allow constantly busy flows to migrate to a more suitable CPU/NIC queue. Normally we perform queue re-selection when flow comes out of idle, but under extreme circumstances the flows may be constantly busy. Add sysctl to allow periodic rehashing even if it'd risk packet reordering. - Optimize the NAPI skb cache, make it larger, use it in more paths. - Attempt returning Tx skbs to the originating CPU (like we already did for Rx skbs). - Various data structure layout and prefetch optimizations from Eric. - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly quite expensive on recent AMD machines. - Extend threaded NAPI polling to allow the kthread busy poll for packets. - Make MPTCP use Rx backlog processing. This lowers the lock pressure, improving the Rx performance. - Support memcg accounting of MPTCP socket memory. - Allow admin to opt sockets out of global protocol memory accounting (using a sysctl or BPF-based policy). The global limits are a poor fit for modern container workloads, where limits are imposed using cgroups. - Improve heuristics for when to kick off AF_UNIX garbage collection. - Allow users to control TCP SACK compression, and default to 33% of RTT. - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily aggressive rcvbuf growth and overshot when the connection RTT is low. - Preserve skb metadata space across skb_push / skb_pull operations. - Support for IPIP encapsulation in the nftables flowtable offload. - Support appending IP interface information to ICMP messages (RFC 5837). - Support setting max record size in TLS (RFC 8449). - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL. - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock. - Let users configure the number of write buffers in SMC. - Add new struct sockaddr_unsized for sockaddr of unknown length, from Kees. - Some conversions away from the crypto_ahash API, from Eric Biggers. - Some preparations for slimming down struct page. - YAML Netlink protocol spec for WireGuard. - Add a tool on top of YAML Netlink specs/lib for reporting commonly computed derived statistics and summarized system state. Driver API: - Add CAN XL support to the CAN Netlink interface. - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as defined by the OPEN Alliance's "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" specification. - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x). - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec and performs RSS. - Add info to devlink params whether the current setting is the default or a user override. Allow resetting back to default. - Add standard device stats for PSP crypto offload. - Leverage DSA frame broadcast to implement simple HSR frame duplication for a lot of switches without dedicated HSR offload. - Add uAPI defines for 1.6Tbps link modes. Device drivers: - Add Motorcomm YT921x gigabit Ethernet switch support. - Add MUCSE driver for N500/N210 1GbE NIC series. - Convert drivers to support dedicated ops for timestamping control, and away from the direct IOCTL handling. While at it support GET operations for PHY timestamping. - Add (and convert most drivers to) a dedicated ethtool callback for reading the Rx ring count. - Significant refactoring efforts in the STMMAC driver, which supports Synopsys turn-key MAC IP integrated into a ton of SoCs. - Ethernet high-speed NICs: - Broadcom (bnxt): - support PPS in/out on all pins - Intel (100G, ice, idpf): - ice: implement standard ethtool and timestamping stats - i40e: support setting the max number of MAC addresses per VF - iavf: support RSS of GTP tunnels for 5G and LTE deployments - nVidia/Mellanox (mlx5): - reduce downtime on interface reconfiguration - disable being an XDP redirect target by default (same as other drivers) to avoid wasting resources if feature is unused - Meta (fbnic): - add support for Linux-managed PCS on 25G, 50G, and 100G links - Wangxun: - support Rx descriptor merge, and Tx head writeback - support Rx coalescing offload - support 25G SPF and 40G QSFP modules - Ethernet virtual: - Google (gve): - allow ethtool to configure rx_buf_len - implement XDP HW RX Timestamping support for DQ descriptor format - Microsoft vNIC (mana): - support HW link state events - handle hardware recovery events when probing the device - Ethernet NICs consumer, and embedded: - usbnet: add support for Byte Queue Limits (BQL) - AMD (amd-xgbe): - add device selftests - NXP (enetc): - add i.MX94 support - Broadcom integrated MACs (bcmgenet, bcmasp): - bcmasp: add support for PHY-based Wake-on-LAN - Broadcom switches (b53): - support port isolation - support BCM5389/97/98 and BCM63XX ARL formats - Lantiq/MaxLinear switches: - support bridge FDB entries on the CPU port - use regmap for register access - allow user to enable/disable learning - support Energy Efficient Ethernet - support configuring RMII clock delays - add tagging driver for MaxLinear GSW1xx switches - Synopsys (stmmac): - support using the HW clock in free running mode - add Eswin EIC7700 support - add Rockchip RK3506 support - add Altera Agilex5 support - Cadence (macb): - cleanup and consolidate descriptor and DMA address handling - add EyeQ5 support - TI: - icssg-prueth: support AF_XDP - Airoha access points: - add missing Ethernet stats and link state callback - add AN7583 support - support out-of-order Tx completion processing - Power over Ethernet: - pd692x0: preserve PSE configuration across reboots - add support for TPS23881B devices - Ethernet PHYs: - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support - Support 50G SerDes and 100G interfaces in Linux-managed PHYs - micrel: - support for non PTP SKUs of lan8814 - enable in-band auto-negotiation on lan8814 - realtek: - cable testing support on RTL8224 - interrupt support on RTL8221B - motorcomm: support for PHY LEDs on YT853 - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag - mscc: support for PHY LED control - CAN drivers: - m_can: add support for optional reset and system wake up - remove can_change_mtu() obsoleted by core handling - mcp251xfd: support GPIO controller functionality - Bluetooth: - add initial support for PASTa - WiFi: - split ieee80211.h file, it's way too big - improvements in VHT radiotap reporting, S1G, Channel Switch Announcement handling, rate tracking in mesh networks - improve multi-radio monitor mode support, and add a cfg80211 debugfs interface for it - HT action frame handling on 6 GHz - initial chanctx work towards NAN - MU-MIMO sniffer improvements - WiFi drivers: - RealTek (rtw89): - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - Intel: - iwlwifi: new sniffer API support - MediaTek (mt76): - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - Qualcomm/Atheros: - ath10k: factory test support - ath11k: TX power insertion support - ath12k: BSS color change support - ath12k: statistics improvements - brcmfmac: Acer A1 840 tablet quirk - rtl8xxxu: 40 MHz connection fixes/support" * tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits) net: page_pool: sanitise allocation order net: page pool: xa init with destroy on pp init net/mlx5e: Support XDP target xmit with dummy program net/mlx5e: Update XDP features in switch channels selftests/tc-testing: Test CAKE scheduler when enqueue drops packets net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop wireguard: netlink: generate netlink code wireguard: uapi: generate header with ynl-gen wireguard: uapi: move flag enums wireguard: uapi: move enum wg_cmd wireguard: netlink: add YNL specification selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py selftests: drv-net: introduce Iperf3Runner for measurement use cases selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive() Documentation: net: dsa: mention simple HSR offload helpers Documentation: net: dsa: mention availability of RedBox ...
2025-11-16convert xenfsAl Viro
entirely static tree, populated by simple_fill_super(). Can switch to kill_anon_super() without any other changes. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-14syscore: Pass context data to callbacksThierry Reding
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-13drivers/xen/xenbus: Fix namespace collision and split() section placement ↵Josh Poimboeuf
with AutoFDO When compiling the kernel with -ffunction-sections enabled, the split() function gets compiled into the .text.split section. In some cases it can even be cloned into .text.split.constprop.0 or .text.split.isra.0. However, .text.split.* is already reserved for use by the Clang -fsplit-machine-functions flag, which is used by AutoFDO. That may place part of a function's code in a .text.split.<func> section. This naming conflict causes the vmlinux linker script to wrongly place split() with other .text.split.* code, rather than where it belongs with regular text. Fix it by renaming split() to split_strings(). Fixes: 6568f14cb5ae ("vmlinux.lds: Exclude .text.startup and .text.exit from TEXT_MAIN") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: live-patching@vger.kernel.org Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/92a194234a0f757765e275b288bb1a7236c2c35c.1762991150.git.jpoimboe@kernel.org
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-03Merge tag 'dma-mapping-6.18-2025-09-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_*map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_*map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-09-22xen: take system_transition_mutex on suspendMarek Marczykowski-Górecki
Xen's do_suspend() calls dpm_suspend_start() without taking required system_transition_mutex. Since 12ffc3b1513eb moved the pm_restrict_gfp_mask() call, not taking that mutex results in a WARN. Take the mutex in do_suspend(), and use mutex_trylock() to follow how enter_state() does this. Suggested-by: Jürgen Groß <jgross@suse.com> Fixes: 12ffc3b1513eb "PM: Restrict swap use to later in the suspend sequence" Link: https://lore.kernel.org/xen-devel/aKiBJeqsYx_4Top5@mail-itl/ Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Cc: stable@vger.kernel.org # v6.16+ Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250921162853.223116-1-marmarek@invisiblethingslab.com>
2025-09-13mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page()David Hildenbrand
... and hide it behind a kconfig option. There is really no need for any !xen code to perform this check. The naming is a bit off: we want to find the "normal" page when a PTE was marked "special". So it's really not "finding a special" page. Improve the documentation, and add a comment in the code where XEN ends up performing the pte_mkspecial() through a hypercall. More details can be found in commit 923b2919e2c3 ("xen/gntdev: mark userspace PTEs as special on x86 PV guests"). Link: https://lkml.kernel.org/r/20250811112631.759341-12-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mariano Pache <npache@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-12xen: swiotlb: Open code map_resource callbackLeon Romanovsky
General dma_direct_map_resource() is going to be removed in next patch, so simply open-code it in xen driver. Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/e9c66a92e818f416875441b6711963f9782dbbeb.1757423202.git.leonro@nvidia.com
2025-09-09xen/manage: Fix suspend error pathLukas Wunner
The device power management API has the following asymmetry: * dpm_suspend_start() does not clean up on failure (it requires a call to dpm_resume_end()) * dpm_suspend_end() does clean up on failure (it does not require a call to dpm_resume_start()) The asymmetry was introduced by commit d8f3de0d2412 ("Suspend-related patches for 2.6.27") in June 2008: It removed a call to device_resume() from device_suspend() (which was later renamed to dpm_suspend_start()). When Xen began using the device power management API in May 2008 with commit 0e91398f2a5d ("xen: implement save/restore"), the asymmetry did not yet exist. But since it was introduced, a call to dpm_resume_end() is missing in the error path of dpm_suspend_start(). Fix it. Fixes: d8f3de0d2412 ("Suspend-related patches for 2.6.27") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v2.6.27 Reviewed-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <22453676d1ddcebbe81641bb68ddf587fee7e21e.1756990799.git.lukas@wunner.de>
2025-09-09xen/events: Update virq_to_irq on migrationJason Andryuk
VIRQs come in 3 flavors, per-VPU, per-domain, and global, and the VIRQs are tracked in per-cpu virq_to_irq arrays. Per-domain and global VIRQs must be bound on CPU 0, and bind_virq_to_irq() sets the per_cpu virq_to_irq at registration time Later, the interrupt can migrate, and info->cpu is updated. When calling __unbind_from_irq(), the per-cpu virq_to_irq is cleared for a different cpu. If bind_virq_to_irq() is called again with CPU 0, the stale irq is returned. There won't be any irq_info for the irq, so things break. Make xen_rebind_evtchn_to_cpu() update the per_cpu virq_to_irq mappings to keep them update to date with the current cpu. This ensures the correct virq_to_irq is cleared in __unbind_from_irq(). Fixes: e46cdb66c8fc ("xen: event channels") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-4-jason.andryuk@amd.com>
2025-09-09xen/events: Return -EEXIST for bound VIRQsJason Andryuk
Change find_virq() to return -EEXIST when a VIRQ is bound to a different CPU than the one passed in. With that, remove the BUG_ON() from bind_virq_to_irq() to propogate the error upwards. Some VIRQs are per-cpu, but others are per-domain or global. Those must be bound to CPU0 and can then migrate elsewhere. The lookup for per-domain and global will probably fail when migrated off CPU 0, especially when the current CPU is tracked. This now returns -EEXIST instead of BUG_ON(). A second call to bind a per-domain or global VIRQ is not expected, but make it non-fatal to avoid trying to look up the irq, since we don't know which per_cpu(virq_to_irq) it will be in. Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-3-jason.andryuk@amd.com>
2025-09-09xen/events: Cleanup find_virq() return codesJason Andryuk
rc is overwritten by the evtchn_status hypercall in each iteration, so the return value will be whatever the last iteration is. This could incorrectly return success even if the event channel was not found. Change to an explicit -ENOENT for an un-found virq and return 0 on a successful match. Fixes: 62cc5fc7b2e0 ("xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250828003604.8949-2-jason.andryuk@amd.com>
2025-09-08drivers/xen/gntdev: use xen_pv_domain() instead of cached valueJuergen Gross
Eliminate the use_ptemod variable by replacing its use cases with xen_pv_domain(). Instead of passing the xen_pv_domain() return value to gntdev_ioctl_dmabuf_exp_from_refs(), use xen_pv_domain() in that function. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250826145608.10352-4-jgross@suse.com>
2025-09-08xen: replace XENFEAT_auto_translated_physmap with xen_pv_domain()Juergen Gross
Instead of testing the XENFEAT_auto_translated_physmap feature, just use !xen_pv_domain() which is equivalent. This has the advantage that a kernel not built with CONFIG_XEN_PV will be smaller due to dead code elimination. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250826145608.10352-3-jgross@suse.com>
2025-08-20drivers/xen/xenbus: remove quirk for Xen 3.xJuergen Gross
The kernel is not supported to run as a Xen guest on Xen versions older than 4.0. Remove xen_strict_xenbus_quirk() which is testing the Xen version to be at least 4.0. Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250815074052.13792-1-jgross@suse.com>
2025-07-14xen/gntdev: remove struct gntdev_copy_batch from stackJuergen Gross
When compiling the kernel with LLVM, the following warning was issued: drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds limit (1024) in function 'gntdev_ioctl' The main reason is struct gntdev_copy_batch which is located on the stack and has a size of nearly 1kb. For performance reasons it shouldn't by just dynamically allocated instead, so allocate a new instance when needed and instead of freeing it put it into a list of free structs anchored in struct gntdev_priv. Fixes: a4cdb556cae0 ("xen/gntdev: add ioctl for grant copy") Reported-by: Abinash Singh <abinashsinghlalotra@gmail.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250703073259.17356-1-jgross@suse.com>
2025-07-14xen: fix UAF in dmabuf_exp_from_pages()Al Viro
[dma_buf_fd() fixes; no preferences regarding the tree it goes through - up to xen folks] As soon as we'd inserted a file reference into descriptor table, another thread could close it. That's fine for the case when all we are doing is returning that descriptor to userland (it's a race, but it's a userland race and there's nothing the kernel can do about it). However, if we follow fd_install() with any kind of access to objects that would be destroyed on close (be it the struct file itself or anything destroyed by its ->release()), we have a UAF. dma_buf_fd() is a combination of reserving a descriptor and fd_install(). gntdev dmabuf_exp_from_pages() calls it and then proceeds to access the objects destroyed on close - starting with gntdev_dmabuf itself. Fix that by doing reserving descriptor before anything else and do fd_install() only when everything had been set up. Fixes: a240d6e42e28 ("xen/gntdev: Implement dma-buf export functionality") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20250712050916.GY1880847@ZenIV> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-07-14xen: Remove some deadcode (x)Dr. David Alan Gilbert
Remove three uncalled functions: xenbus_mkdir() was added in 2007 by commit 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug driver") but has remained unused. xen_get_runstate_snapshot() last use was removed in 2016 by commit 6ba286ad8457 ("xen: support runqueue steal time on xen") which replaces the use by the _cpu version. xen_resume_notifier_unregister() last use was removed in 2017 by commit 1914f0cd203c ("xen/acpi: upload PM state from init-domain to Xen") Remove them. Signed-off-by: "Dr. David Alan Gilbert" <linux@treblig.org> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250713132625.164728-1-linux@treblig.org>
2025-07-14xen-pciback: Replace scnprintf() with sysfs_emit_at()Ryan Chung
This is the third revision (v3) of this patch series. No changes since v2—only adding Reviewed-by lines. Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250708001444.86155-1-seokwoo.chung130@gmail.com>
2025-07-14xen/xenbus: fix W=1 build warning in xenbus_va_dev_error functionPeng Jiang
This patch fixes a W=1 format-string warning reported by GCC 12.3.0 by annotating xenbus_switch_fatal() and xenbus_va_dev_error() with the __printf attribute. The attribute enables compile-time validation of printf-style format strings in these functions. The original warning trace: drivers/xen/xenbus/xenbus_client.c:304:9: warning: function 'xenbus_va_dev_error' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format] Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20250620084104786r5xoR16_AmYZMJLnno3_Q@zte.com.cn> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-05-23xen/x86: fix initial memory balloon targetRoger Pau Monne
When adding extra memory regions as ballooned pages also adjust the balloon target, otherwise when the balloon driver is started it will populate memory to match the target value and consume all the extra memory regions added. This made the usage of the Xen `dom0_mem=,max:` command line parameter for dom0 not work as expected, as the target won't be adjusted and when the balloon is started it will populate memory straight to the 'max:' value. It would equally affect domUs that have memory != maxmem. Kernels built with CONFIG_XEN_UNPOPULATED_ALLOC are not affected, because the extra memory regions are consumed by the unpopulated allocation driver, and then balloon_add_regions() becomes a no-op. Reported-by: John <jw@nuclearfallout.net> Fixes: 87af633689ce ('x86/xen: fix balloon target initialization for PVH dom0') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Message-ID: <20250514080427.28129-1-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-05-23xen: swiotlb: Wire up map_resource callbackJohn Ernberg
When running Xen on iMX8QXP, an Arm SoC without IOMMU, DMA performed via its eDMA v3 DMA engine fail with a mapping error. The eDMA performs DMA between RAM and MMIO space, and it's the MMIO side that cannot be mapped. MMIO->RAM DMA access cannot be bounce buffered if it would straddle a page boundary and on Xen the MMIO space is 1:1 mapped for Arm, and x86 PV Dom0. Cases where MMIO space is not 1:1 mapped, such as x86 PVH Dom0, requires an IOMMU present to deal with the mapping. Considering the above the map_resource callback can just be wired to the existing dma_direct_map_resource() function. There is nothing to do for unmap so the unmap callback is not needed. Signed-off-by: John Ernberg <john.ernberg@actia.se> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-ID: <20250512071440.3726697-1-john.ernberg@actia.se> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-05-07xenbus: Use kref to track req lifetimeJason Andryuk
Marek reported seeing a NULL pointer fault in the xenbus_thread callstack: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: e030:__wake_up_common+0x4c/0x180 Call Trace: <TASK> __wake_up_common_lock+0x82/0xd0 process_msg+0x18e/0x2f0 xenbus_thread+0x165/0x1c0 process_msg+0x18e is req->cb(req). req->cb is set to xs_wake_up(), a thin wrapper around wake_up(), or xenbus_dev_queue_reply(). It seems like it was xs_wake_up() in this case. It seems like req may have woken up the xs_wait_for_reply(), which kfree()ed the req. When xenbus_thread resumes, it faults on the zero-ed data. Linux Device Drivers 2nd edition states: "Normally, a wake_up call can cause an immediate reschedule to happen, meaning that other processes might run before wake_up returns." ... which would match the behaviour observed. Change to keeping two krefs on each request. One for the caller, and one for xenbus_thread. Each will kref_put() when finished, and the last will free it. This use of kref matches the description in Documentation/core-api/kref.rst Link: https://lore.kernel.org/xen-devel/ZO0WrR5J0xuwDIxW@mail-itl/ Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250506210935.5607-1-jason.andryuk@amd.com>
2025-05-07xenbus: Allow PVH dom0 a non-local xenstoreJason Andryuk
Make xenbus_init() allow a non-local xenstore for a PVH dom0 - it is currently forced to XS_LOCAL. With Hyperlaunch booting dom0 and a xenstore stubdom, dom0 can be handled as a regular XS_HVM following the late init path. Ideally we'd drop the use of xen_initial_domain() and just check for the event channel instead. However, ARM has a xen,enhanced no-xenstore mode, where the event channel and PFN would both be 0. Retain the xen_initial_domain() check, and use that for an additional check when the event channel is 0. Check the full 64bit HVM_PARAM_STORE_EVTCHN value to catch the off chance that high bits are set for the 32bit event channel. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Change-Id: I5506da42e4c6b8e85079fefb2f193c8de17c7437 Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250506204456.5220-1-jason.andryuk@amd.com>
2025-05-07xen: swiotlb: Use swiotlb bouncing if kmalloc allocation demands itJohn Ernberg
Xen swiotlb support was missed when the patch set starting with 4ab5f8ec7d71 ("mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN") was merged. When running Xen on iMX8QXP, a SoC without IOMMU, the effect was that USB transfers ended up corrupted when there was more than one URB inflight at the same time. Add a call to dma_kmalloc_needs_bounce() to make sure that allocations too small for DMA get bounced via swiotlb. Closes: https://lore.kernel.org/linux-usb/ab2776f0-b838-4cf6-a12a-c208eb6aad59@actia.se/ Fixes: 4ab5f8ec7d71 ("mm/slab: decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN") Cc: stable@kernel.org # v6.5+ Signed-off-by: John Ernberg <john.ernberg@actia.se> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250502114043.1968976-2-john.ernberg@actia.se>
2025-04-07x86/xen: fix balloon target initialization for PVH dom0Roger Pau Monne
PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to dom0 are re-used as scratch memory to map foreign and grant pages. Such logic relies on reporting those unpopulated ranges as RAM to Linux, and mark them as reserved. This way Linux creates the underlying page structures required for metadata management. Such approach works fine on PV because the initial balloon target is calculated using specific Xen data, that doesn't take into account the memory type changes described above. However on HVM and PVH the initial balloon target is calculated using get_num_physpages(), and that function does take into account the unpopulated RAM regions used as scratch space for remote domain mappings. This leads to PVH dom0 having an incorrect initial balloon target, which causes malfunction (excessive memory freeing) of the balloon driver if the dom0 memory target is later adjusted from the toolstack. Fix this by using xen_released_pages to account for any pages that are part of the memory map, but are already unpopulated when the balloon driver is initialized. This accounts for any regions used for scratch remote mappings. Note on x86 xen_released_pages definition is moved to enlighten.c so it's uniformly available for all Xen-enabled builds. Take the opportunity to unify PV with PVH/HVM guests regarding the usage of get_num_physpages(), as that avoids having to add different logic for PV vs PVH in both balloon_add_regions() and arch_xen_unpopulated_init(). Much like a6aa4eb994ee, the code in this changeset should have been part of 38620fc4e893. Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250407082838.65495-1-roger.pau@citrix.com>
2025-04-07xen: Change xen-acpi-processor dom0 dependencyJason Andryuk
xen-acpi-processor functions under a PVH dom0 with only a xen_initial_domain() runtime check. Change the Kconfig dependency from PV dom0 to generic dom0 to reflect that. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250331172913.51240-1-jason.andryuk@amd.com>
2025-04-07xenbus: add module descriptionArnd Bergmann
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/xen/xenbus/xenbus_probe_frontend.o Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250328113302.2632353-1-arnd@kernel.org>
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...
2025-03-21xen: balloon: update the NR_BALLOON_PAGES stateNico Pache
Update the NR_BALLOON_PAGES counter when pages are added to or removed from the Xen balloon. Link: https://lkml.kernel.org/r/20250314213757.244258-5-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Juegren Gross <jgross@suse.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21xen/pci: Do not register devices with segments >= 0x10000Roger Pau Monne
The current hypercall interface for doing PCI device operations always uses a segment field that has a 16 bit width. However on Linux there are buses like VMD that hook up devices into the PCI hierarchy at segment >= 0x10000, after the maximum possible segment enumerated in ACPI. Attempting to register or manage those devices with Xen would result in errors at best, or overlaps with existing devices living on the truncated equivalent segment values. Note also that the VMD segment numbers are arbitrarily assigned by the OS, and hence there would need to be some negotiation between Xen and the OS to agree on how to enumerate VMD segments and devices behind them. Skip notifying Xen about those devices. Given how VMD bridges can multiplex interrupts on behalf of devices behind them there's no need for Xen to be aware of such devices for them to be usable by Linux. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20250219092059.90850-2-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-14xen/pciback: Remove unused pcistub_get_pci_devDr. David Alan Gilbert
pcistub_get_pci_dev() was added in 2009 as part of: commit 30edc14bf39a ("xen/pciback: xen pci backend driver.") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20250307004736.291229-1-linux@treblig.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-14xenfs/xensyms: respect hypervisor's "next" indicationJan Beulich
The interface specifies the symnum field as an input and output; the hypervisor sets it to the next sequential symbol's index. xensyms_next() incrementing the position explicitly (and xensyms_next_sym() decrementing it to "rewind") is only correct as long as the sequence of symbol indexes is non-sparse. Use the hypervisor-supplied value instead to update the position in xensyms_next(), and use the saved incoming index in xensyms_next_sym(). Cc: stable@kernel.org Fixes: a11f4f0a4e18 ("xen: xensyms support") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <15d5e7fa-ec5d-422f-9319-d28bed916349@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-14xen: Add support for XenServer 6.1 platform deviceFrediano Ziglio
On XenServer on Windows machine a platform device with ID 2 instead of 1 is used. This device is mainly identical to device 1 but due to some Windows update behaviour it was decided to use a device with a different ID. This causes compatibility issues with Linux which expects, if Xen is detected, to find a Xen platform device (5853:0001) otherwise code will crash due to some missing initialization (specifically grant tables). Specifically from dmesg RIP: 0010:gnttab_expand+0x29/0x210 Code: 90 0f 1f 44 00 00 55 31 d2 48 89 e5 41 57 41 56 41 55 41 89 fd 41 54 53 48 83 ec 10 48 8b 05 7e 9a 49 02 44 8b 35 a7 9a 49 02 <8b> 48 04 8d 44 39 ff f7 f1 45 8d 24 06 89 c3 e8 43 fe ff ff 44 39 RSP: 0000:ffffba34c01fbc88 EFLAGS: 00010086 ... The device 2 is presented by Xapi adding device specification to Qemu command line. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20250227145016.25350-1-frediano.ziglio@cloud.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-14Merge tag 'for-linus-6.14-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Three fixes to xen-swiotlb driver: - two fixes for issues coming up due to another fix in 6.12 - addition of an __init annotation" * tag 'for-linus-6.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Xen/swiotlb: mark xen_swiotlb_fixup() __init x86/xen: allow larger contiguous memory regions in PV guests xen/swiotlb: relax alignment requirements
2025-02-13Xen/swiotlb: mark xen_swiotlb_fixup() __initJan Beulich
It's sole user (pci_xen_swiotlb_init()) is __init, too. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-ID: <e1198286-99ec-41c1-b5ad-e04e285836c9@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-13xen/swiotlb: relax alignment requirementsJuergen Gross
When mapping a buffer for DMA via .map_page or .map_sg DMA operations, there is no need to check the machine frames to be aligned according to the mapped areas size. All what is needed in these cases is that the buffer is contiguous at machine level. So carve out the alignment check from range_straddles_page_boundary() and move it to a helper called by xen_swiotlb_alloc_coherent() and xen_swiotlb_free_coherent() directly. Fixes: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers") Reported-by: Jan Vejvalka <jan.vejvalka@lfmotol.cuni.cz> Tested-by: Jan Vejvalka <jan.vejvalka@lfmotol.cuni.cz> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-01-29Merge tag 'for-linus-6.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Three minor fixes" * tag 'for-linus-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: update pvcalls_front_accept prototype Grab mm lock before grabbing pt lock xen: pcpu: remove unnecessary __ref annotation
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-22xen: update pvcalls_front_accept prototypeStefano Stabellini
While currently there are no in-tree callers of these functions, it is best to keep them up-to-date with the latest network API. In addition, add the missing EXPORT_SYMBOL_GPL to the functions listed in pvcalls-front.h. Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <alpine.DEB.2.22.394.2501201537560.11086@ubuntu-linux-20-04-desktop> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-01-21Merge tag 'irq-core-2025-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: - Consolidate the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. * tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set() genirq/timings: Add kernel-doc for a function parameter genirq: Remove IRQ_MOVE_PCNTXT and related code x86/apic: Convert to IRQCHIP_MOVE_DEFERRED genirq: Provide IRQCHIP_MOVE_DEFERRED hexagon: Remove GENERIC_PENDING_IRQ leftover ARC: Remove GENERIC_PENDING_IRQ genirq: Remove handle_enforce_irqctx() wrapper genirq: Make handle_enforce_irqctx() unconditionally available irqchip/loongarch-avec: Add multi-nodes topology support irqchip/ts4800: Replace seq_printf() by seq_puts() irqchip/ti-sci-inta : Add module build support irqchip/ti-sci-intr: Add module build support irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown kexec: Consolidate machine_kexec_mask_interrupts() implementation genirq: Reuse irq_thread_fn() for forced thread case genirq: Move irq_thread_fn() further up in the code
2025-01-20xen: pcpu: remove unnecessary __ref annotationSergio Miguéns Iglesias
The __ref annotation has been there since many years now, but the reason why it has been added has been removed from the code long ago. [jgross: clarify commit message] Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: xen-devel@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sergio Miguéns Iglesias <sergio@lony.xyz> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241202085910.5539-1-sergio@lony.xyz> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-01-15x86/apic: Convert to IRQCHIP_MOVE_DEFERREDThomas Gleixner
Instead of marking individual interrupts as safe to be migrated in arbitrary contexts, mark the interrupt chips, which require the interrupt to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED flag. This makes more sense because this is a per interrupt chip property and not restricted to individual interrupts. That flips the logic from the historical opt-out to a opt-in model. This is simpler to handle for other architectures, which default to unrestricted affinity setting. It also allows to cleanup the redundant core logic significantly. All interrupt chips, which belong to a top-level domain sitting directly on top of the x86 vector domain are marked accordingly, unless the related setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01Get rid of 'remove_new' relic from platform driver structLinus Torvalds
The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>