sugar/linux - Sugar's Linux Kernel Patch

Age	Commit message (Collapse)	Author
3 days	Merge tag 'driver-core-6.19-rc1' of ↵HEAD mainline	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "Arch Topology: - Move parse_acpi_topology() from arm64 to common code for reuse in RISC-V CPU: - Expose housekeeping CPUs through /sys/devices/system/cpu/housekeeping - Print a newline (or 0x0A) instead of '(null)' reading /sys/devices/system/cpu/nohz_full when nohz_full= is not set debugfs - Remove (broken) 'no-mount' mode - Remove redundant access mode checks in debugfs_get_tree() and debugfs_create_() functions Devres: - Remove unused devm_free_percpu() helper - Move devm_alloc_percpu() from device.h to devres.h Firmware Loader: - Replace simple_strtol() with kstrtoint() - Do not call cancel_store() when no upload is in progress kernfs: - Increase struct super_block::maxbytes to MAX_LFS_FILESIZE - Fix a missing unwind path in __kernfs_new_node() Misc: - Increase the name size in struct auxiliary_device_id to 40 characters - Replace system_unbound_wq with system_dfl_wq and add WQ_PERCPU to alloc_workqueue() Platform: - Replace ERR_PTR() with IOMEM_ERR_PTR() in platform ioremap functions Rust: - Auxiliary: - Unregister auxiliary device on parent device unbind - Move parent() to impl Device; implement device context aware parent() for Device<Bound> - Illustrate how to safely obtain a driver's device private data when calling from an auxiliary driver into the parant device driver - DebugFs: - Implement support for binary large objects - Device: - Let probe() return the driver's device private data as pinned initializer, i.e. impl PinInit<Self, Error> - Implement safe accessor for a driver's device private data for Device<Bound> (returned reference can't out-live driver binding and guarantees the correct private data type) - Implement AsBusDevice trait, to be used by class device abstractions to derive the bus device type of the parent device - DMA: - Store raw pointer of allocation as NonNull - Use start_ptr() and start_ptr_mut() to inherit correct mutability of self - FS: - Add file::Offset type alias - I2C: - Add abstractions for I2C device / driver infrastructure - Implement abstractions for manual I2C device registrations - I/O: - Use "kernel vertical" style for imports - Define ResourceSize as resource_size_t - Move ResourceSize to top-level I/O module - Add type alias for phys_addr_t - Implement Rust version of read_poll_timeout_atomic() - PCI: - Use "kernel vertical" style for imports - Move I/O and IRQ infrastructure to separate files - Add support for PCI interrupt vectors - Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to convert an IrqVector bound to specific pci::Device into an IrqRequest bound to the same pci::Device's parent Device - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - PinInit: - Add {pin_}init_scope() to execute code before creating an initializer - Platform: - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - Timekeeping: - Implement abstraction of udelay() - Uaccess: - Implement read_slice_partial() and read_slice_file() for UserSliceReader - Implement write_slice_partial() and write_slice_file() for UserSliceWriter sysfs: - Prepare the constification of struct attribute" tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (75 commits) rust: pci: fix build failure when CONFIG_PCI_MSI is disabled debugfs: Fix default access mode config check debugfs: Remove broken no-mount mode debugfs: Remove redundant access mode checks driver core: Check drivers_autoprobe for all added devices driver core: WQ_PERCPU added to alloc_workqueue users driver core: replace use of system_unbound_wq with system_dfl_wq tick/nohz: Expose housekeeping CPUs in sysfs tick/nohz: avoid showing '(null)' if nohz_full= not set sysfs/cpu: Use DEVICE_ATTR_RO for nohz_full attribute kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE mod_devicetable: Bump auxiliary_device_id name size sysfs: simplify attribute definition macros samples/kobject: constify 'struct foo_attribute' samples/kobject: add is_visible() callback to attribute group sysfs: attribute_group: enable const variants of is_visible() sysfs: introduce __SYSFS_FUNCTION_ALTERNATIVE() sysfs: transparently handle const pointers in ATTRIBUTE_GROUPS() sysfs: attribute_group: allow registration of const attribute ...
3 days	nfs/localio: fix regression due to out-of-order __put_cred	Mike Snitzer
	Commit f2060bdc21d7 ("nfs/localio: add refcounting for each iocb IO associated with NFS pgio header") inadvertantly reintroduced the same potential for __put_cred() triggering BUG_ON(cred == current->cred) that commit 992203a1fba5 ("nfs/localio: restore creds before releasing pageio data") fixed. Fix this by saving and restoring the cred around each {read,write}_iter call within the respective for loop of nfs_local_call_{read,write} using scoped_with_creds(). NOTE: this fix started by first reverting the following commits: 94afb627dfc2 ("nfs: use credential guards in nfs_local_call_read()") bff3c841f7bd ("nfs: use credential guards in nfs_local_call_write()") 1d18101a644e ("Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs") followed by narrowly fixing the cred lifetime issue by using scoped_with_creds(). In doing so, this commit's changes appear more extensive than they really are (as evidenced by comparing to v6.18's fs/nfs/localio.c). Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/linux-next/20251205111942.4150b06f@canb.auug.org.au/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 days	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
3 days	Merge tag 'uml-for-linux-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: "Apart from the usual small churn, we have - initial SMP support (only kernel) - major vDSO cleanups (and fixes for 32-bit)" * tag 'uml-for-linux-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (33 commits) um: Disable KASAN_INLINE when STATIC_LINK is selected um: Don't rename vmap to kernel_vmap um: drivers: virtio: use string choices helper um: Always set up AT_HWCAP and AT_PLATFORM x86/um: Remove FIXADDR_USER_START and FIXADDR_USE_END um: Remove __access_ok_vsyscall() um: Remove redundant range check from __access_ok_vsyscall() um: Remove fixaddr_user_init() x86/um: Drop gate area handling x86/um: Do not inherit vDSO from host um: Split out default elf_aux_hwcap x86/um: Move ELF_PLATFORM fallback to x86-specific code um: Split out default elf_aux_platform um: Avoid circular dependency on asm-offsets in pgtable.h um: Enable SMP support on x86 asm-generic: percpu: Add assembly guard um: vdso: Remove getcpu support on x86 um: Add initial SMP support um: Define timers on a per-CPU basis um: Determine sleep based on need_resched() ...
3 days	ovl: pass original credentials, not mounter credentials during create	Christian Brauner
	When creating new files the security layer expects the original credentials to be passed. When cleaning up the code this was accidently changed to pass the mounter's credentials by relying on current->cred which is already overriden at this point. Pass the original credentials directly. Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Paul Moore <paul@paul-moore.com> Fixes: e566bff96322 ("ovl: port ovl_create_or_link() to new ovl_override_creator_creds") Link: https://lore.kernel.org/CAFqZXNvL1ciLXMhHrnoyBmQu1PAApH41LkSWEhrcvzAAbFij8Q@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 days	Merge tag 'vfs-6.19-rc1.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a type conversion bug in the ipc subsystem - Fix per-dentry timeout warning in autofs - Drop the fd conversion from sockets - Move assert from iput_not_last() to iput() - Fix reversed check in filesystems_freeze_callback() - Use proper uapi types for new struct delegation definitions * tag 'vfs-6.19-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: use UAPI types for new struct delegation definition mqueue: correct the type of ro to int Revert "net/socket: convert sock_map_fd() to FD_ADD()" autofs: fix per-dentry timeout warning fs: assert on I_FREEING not being set in iput() and iput_not_last() fs: PM: Fix reverse check in filesystems_freeze_callback()
3 days	Merge tag 'exfat-for-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix a remount failure caused by differing process masks by inheriting the original mount options during the remount process - Fix a potential divide-by-zero error and system crash in exfat_allocate_bitmap that occurred when the readahead count was zero - Add validation for directory cluster bitmap bits to prevent directory and root cluster from being incorrectly zeroed out on corrupted images - Clear the post-EOF page cache when extending a file to prevent stale mmap data from becoming visible, addressing an generic/363 failure - Fix a reference count leak in exfat_find by properly releasing the dentry set in specific error paths * tag 'exfat-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix remount failure in different process environments exfat: fix divide-by-zero in exfat_allocate_bitmap exfat: validate the cluster bitmap bits of directory exfat: zero out post-EOF page cache on file extension exfat: fix refcount leak in exfat_find
3 days	Merge tag 'fuse-update-6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add mechanism for cleaning out unused, stale dentries; controlled via a module option (Luis Henriques) - Fix various bugs - Cleanups * tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Uninitialized variable in fuse_epoch_work() fuse: fix io-uring list corruption for terminated non-committed requests fuse: signal that a fuse inode should exhibit local fs behaviors fuse: Always flush the page cache before FOPEN_DIRECT_IO write fuse: Invalidate the page cache after FOPEN_DIRECT_IO write fuse: rename 'namelen' to 'namesize' fuse: use strscpy instead of strcpy fuse: refactor fuse_conn_put() to remove negative logic. fuse: new work queue to invalidate dentries from old epochs fuse: new work queue to periodically invalidate expired dentries dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused() fuse: add WARN_ON and comment for RCU revalidate fuse: Fix whitespace for fuse_uring_args_to_ring() comment fuse: missing copy_finish in fuse-over-io-uring argument copies fuse: fix readahead reclaim deadlock
3 days	Merge tag 'pull-persistency' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
3 days	Merge tag 'mm-stable-2025-12-03-21-26' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
3 days	Merge tag 'sysctl-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move jiffies converters out of kernel/sysctl.c Move the jiffies converters into kernel/time/jiffies.c and replace the pipe-max-size proc_handler converter with a macro based version. This is all part of the effort to relocate non-sysctl logic out of kernel/sysctl.c into more relevant subsystems. No functional changes. - Generalize proc handler converter creation Remove duplicated sysctl converter logic by consolidating it in macros. These are used inside sysctl core as well as in pipe.c and jiffies.c. Converter kernel and user space pointer args are now automatically const qualified for the convenience of the caller. No functional changes. - Miscellaneous Fix kernel-doc format warnings, remove unnecessary __user qualifiers, and move the nmi_watchdog sysctl into .rodata. - Testing This series was run through sysctl selftests/kunit test suite in x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks of testing. * tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits) sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv sysctl: Create pipe-max-size converter using sysctl UINT macros sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c sysctl: Move jiffies converters to kernel/time/jiffies.c sysctl: Move UINT converter macros to sysctl header sysctl: Move INT converter macros to sysctl header sysctl: Allow custom converters from outside sysctl sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument sysctl: Create macro for user-to-kernel uint converter sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM sysctl: Create unsigned int converter using new macro sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM sysctl: Create integer converters with one macro sysctl: Create converter functions with two new macros sysctl: Discriminate between kernel and user converter params sysctl: Indicate the direction of operation with macro names sysctl: Remove superfluous __do_proc_* indirection sysctl: Remove superfluous tbl_data param from "dovec" functions sysctl: Replace void pointer with const pointer to ctl_table sysctl: fix kernel-doc format warning ...
3 days	Merge tag 'pstore-v6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: - pstore/ram: Update module parameters from platform data (Tzung-Bi Shih) * tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Update module parameters from platform data
3 days	Merge tag 'configfs-for-v6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux Pull configfs updates from Andreas Hindborg: "Two commits changing constness of the configfs vtable pointers. We plan to follow up with changes at call sites down the road" * tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux: configfs: Constify ct_item_ops in struct config_item_type configfs: Constify ct_group_ops in struct config_item_type
4 days	autofs: fix per-dentry timeout warning	Ian Kent
	The check that determines if the message that warns about the per-dentry timeout being greater than the super block timeout is not correct. The initial value for this field is -1 and the type of the field is unsigned long. I could change the type to long but the message is in the wrong place too, it should come after the timeout setting. So leave everything else as it is and move the message and check the timeout is actually set as an additional condition on issuing the message. Also fix the timeout comparison. Signed-off-by: Ian Kent <raven@themaw.net> Link: https://patch.msgid.link/20251111060439.19593-2-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
5 days	Merge tag 'ntfs3_for_6.19' of ↵	Linus Torvalds
	https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "New code: - support timestamps prior to epoch - do not overwrite uptodate pages - disable readahead for compressed files - setting of dummy blocksize to read boot_block when mounting - the run_lock initialization when loading $Extend - initialization of allocated memory before use - support for the NTFS3_IOC_SHUTDOWN ioctl - check for minimum alignment when performing direct I/O reads - check for shutdown in fsync Fixes: - mount failure for sparse runs in run_unpack() - use-after-free of sbi->options in cmp_fnames - KMSAN uninit bug after failed mi_read in mi_format_new - uninit error after buffer allocation by __getname() - KMSAN uninit-value in ni_create_attr_list - double free of sbi->options->nls and ownership of fc->fs_private - incorrect vcn adjustments in attr_collapse_range() - mode update when ACL can be reduced to mode - memory leaks in add sub record Changes: - refactor code, updated terminology, spelling - do not kmap pages in (de)compression code - after ntfs_look_free_mft(), code that fails must put mft_inode - default mount options for "acl" and "prealloc" Replaced: - use unsafe_memcpy() to avoid memcpy size warning - ntfs_bio_pages with page cache for compressed files" * tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits) fs/ntfs3: check for shutdown in fsync fs/ntfs3: change the default mount options for "acl" and "prealloc" fs/ntfs3: Prevent memory leaks in add sub record fs/ntfs3: out1 also needs to put mi fs/ntfs3: Fix spelling mistake "recommened" -> "recommended" fs/ntfs3: update mode in xattr when ACL can be reduced to mode fs/ntfs3: check minimum alignment for direct I/O fs/ntfs3: implement NTFS3_IOC_SHUTDOWN ioctl fs/ntfs3: correct attr_collapse_range when file is too fragmented ntfs3: fix double free of sbi->options->nls and clarify ownership of fc->fs_private fs/ntfs3: Initialize allocated memory before use fs/ntfs3: remove ntfs_bio_pages and use page cache for compressed I/O ntfs3: avoid memcpy size warning fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list ntfs3: init run lock for extend inode ntfs: set dummy blocksize to read boot_block when mounting fs/ntfs3: disable readahead for compressed files ntfs3: Fix uninit buffer allocated by __getname() ntfs3: fix uninit memory after failed mi_read in mi_format_new ntfs3: fix use-after-free of sbi->options in cmp_fnames ...
5 days	Merge tag 'ext4_for_linus-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "New features and improvements for the ext4 file system: - Optimize online defragmentation by using folios instead of individual buffer heads - Improve error codes stored in the superblock when the journal aborts - Minor cleanups and clarifications in ext4_map_blocks() - Add documentation of the casefold and encrypt flags - Add support for file systems with a blocksize greater than the pagesize - Improve performance by enabling the caching the fact that an inode does not have a Posix ACL Various Bug Fixes: - Fix false positive complaints from smatch - Fix error code which is returned by ext4fs_dirhash() when Siphash is used without the encryption key - Fix races when writing to inline data files which could trigger a BUG - Fix potential NULL dereference when there is an corrupt file system with an extended attribute value stored in a inode - Fix false positive lockdep report when syzbot uses ext4 and ocfs2 together - Fix false positive reported by DEPT by adjusting lock annotation - Avoid a potential BUG_ON in jbd2 when a file system is massively corrupted - Fix a WARN_ON when superblock is corrupted with a non-NULL terminated mount options field - Add check if the userspace passes in a non-NULL terminated mount options field to EXT4_IOC_SET_TUNE_SB_PARAM - Fix a potential journal checksum failure whena file system is copied while it is mounted read-only - Fix a potential potential orphan file tracking error which only showed on 32-bit systems - Fix assertion checks in mballoc (which have to be explicitly enbled by manually enabling AGGRESSIVE_CHECKS and recompiling) - Avoid complaining about overly large orphan files created by mke2fs with with file systems with a 64k block size" * tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: mark inodes without acls in __ext4_iget() ext4: enable block size larger than page size ext4: add checks for large folio incompatibilities when BS > PS ext4: support verifying data from large folios with fs-verity ext4: make data=journal support large block size ext4: support large block size in __ext4_block_zero_page_range() ext4: support large block size in mpage_prepare_extent_to_map() ext4: support large block size in mpage_map_and_submit_buffers() ext4: support large block size in ext4_block_write_begin() ext4: support large block size in ext4_mpage_readpages() ext4: rename 'page' references to 'folio' in multi-block allocator ext4: prepare buddy cache inode for BS > PS with large folios ext4: support large block size in ext4_mb_init_cache() ext4: support large block size in ext4_mb_get_buddy_page_lock() ext4: support large block size in ext4_mb_load_buddy_gfp() ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion ext4: support large block size in ext4_readdir() ext4: support large block size in ext4_calculate_overhead() ext4: introduce s_min_folio_order for future BS > PS support ...
5 days	Merge tag 'gfs2-for-6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Major withdraw / error handling overhaul based on dlm's new DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like node failures. Make withdraws asynchronous - Fix a bug in commit e4a8b5481c59a that caused 'df' to remain out of sync. ('df' is still allowed to go slightly out of sync for short periods of time) - Prevent recusive memory reclaim in gfs2_unstuff_dinode() - Clean up SDF_JOURNAL_LIVE flag handling - Fix remote evict for read-only filesystems - Fix a misuse of bio_chain() - Various other minor cleanups * tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits) gfs2: Fix use of bio_chain gfs2: Clean up SDF_JOURNAL_LIVE flag handling gfs2: No longer thaw filesystems during a withdraw gfs2: Withdraw immediately in gfs2_trans_add_meta gfs2: New gfs2_withdraw_helper gfs2: Clean up properly during a withdraw gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks} Revert "gfs2: fix infinite loop when checking ail item count before go_inval" Revert "gfs2: Allow some glocks to be used during withdraw" Revert "gfs2: Check for log write errors before telling dlm to unlock" Revert "gfs2: fix a deadlock on withdraw-during-mount" Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6) Revert "gfs2: don't stop reads while withdraw in progress" gfs2: Rename LM_FLAG_{NOEXP -> RECOVER} gfs2: Kill gfs2_io_error_bh_wd ...
5 days	Merge tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb client and server updates from Steve French: - server fixes: - IPC use after free locking fix - fix locking bug in delete paths - fix use after free in disconnect - fix underflow in locking check - error mapping improvement - socket listening improvement - return code mapping fixes - crypto improvements (use default libraries) - cleanup patches: - netfs - client checkpatch cleanup - server cleanup - move server/client duplicate code to common code - fix some defines to better match protocol specification - smbdirect (RDMA) fixes - client debugging improvements for leases * tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd: (44 commits) cifs: Use netfs_alloc/free_folioq_buffer() smb: client: show smb lease key in open_dirs output smb: client: show smb lease key in open_files output ksmbd: ipc: fix use-after-free in ipc_msg_send_request smb: client: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_) checks in recv_done() and smbd_conn_upcall() smb: server: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_) checks in recv_done() and smb_direct_cm_handler() smb: smbdirect: introduce SMBDIRECT_CHECK_STATUS_{WARN,DISCONNECT}() smb: smbdirect: introduce SMBDIRECT_DEBUG_ERR_PTR() helper ksmbd: vfs: fix race on m_flags in vfs_cache ksmbd: Replace strcpy + strcat to improve convert_to_nt_pathname smb: move FILE_SYSTEM_ATTRIBUTE_INFO to common/fscc.h ksmbd: implement error handling for STATUS_INFO_LENGTH_MISMATCH in smb server ksmbd: fix use-after-free in ksmbd_tree_connect_put under concurrency ksmbd: server: avoid busy polling in accept loop smb: move create_durable_reconn to common/smb2pdu.h smb: fix some warnings reported by scripts/checkpatch.pl smb: do some cleanups smb: move FILE_SYSTEM_SIZE_INFO to common/fscc.h smb: move some duplicate struct definitions to common/fscc.h smb: move list of FileSystemAttributes to common/fscc.h ...
5 days	Merge tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds
	Pull xfs updates from Carlos Maiolino: "There are no major changes in xfs. This contains mostly some code cleanups, a few bug fixes and documentation update. Highlights are: - Quota locking cleanup - Getting rid of old xlog_in_core_2_t type" * tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits) docs: remove obsolete links in the xfs online repair documentation xfs: move some code out of xfs_iget_recycle xfs: use zi more in xfs_zone_gc_mount xfs: remove the unused bv field in struct xfs_gc_bio xfs: remove xarray mark for reclaimable zones xfs: remove the xlog_in_core_t typedef xfs: remove l_iclog_heads xfs: remove the xlog_rec_header_t typedef xfs: remove xlog_in_core_2_t xfs: remove a very outdated comment from xlog_alloc_log xfs: cleanup xlog_alloc_log a bit xfs: don't use xlog_in_core_2_t in struct xlog_in_core xfs: add a on-disk log header cycle array accessor xfs: add a XLOG_CYCLE_DATA_SIZE constant xfs: reduce ilock roundtrips in xfs_qm_vop_dqalloc xfs: move xfs_dquot_tree calls into xfs_qm_dqget_cache_{lookup,insert} xfs: move quota locking into xrep_quota_item xfs: move quota locking into xqcheck_commit_dquot xfs: move q_qlock locking into xqcheck_compare_dquot xfs: move q_qlock locking into xchk_quota_item ...
5 days	Merge tag 'erofs-for-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: - Fix a WARNING caused by a recent FSDAX misdetection regression - Fix the filesystem stacking limit for file-backed mounts - Print more informative diagnostics on decompression errors - Switch the on-disk definition `erofs_fs.h` to the MIT license - Minor cleanups * tag 'erofs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: switch on-disk header `erofs_fs.h` to MIT license erofs: get rid of raw bi_end_io() usage erofs: enable error reporting for z_erofs_fixup_insize() erofs: enable error reporting for z_erofs_stream_switch_bufs() erofs: improve Zstd, LZMA and DEFLATE error strings erofs: improve decompression error reporting erofs: tidy up z_erofs_lz4_handle_overlap() erofs: limit the level of fs stacking for file-backed mounts erofs: correct FSDAX detection
5 days	Merge tag 'hfs-v6.19-tag1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs Pull hfs/hfsplus updates from Viacheslav Dubeyko: "Several fixes for syzbot reported issues, HFS/HFS+ fixes of xfstests failures, Kunit-based unit-tests introduction, and code cleanup: - Dan Carpenter fixed a potential use-after-free issue in hfs_correct_next_unused_CNID() method. Tetsuo Handa has made nice fix of syzbot reported issue related to incorrect inode->i_mode management if volume has been corrupted somehow. Yang Chenzhi has made really good fix of potential race condition in __hfs_bnode_create() method for HFS+ file system. - Several fixes to xfstests failures. Particularly, generic/070, generic/073, and generic/101 test-cases finish successfully for the case of HFS+ file system right now. - HFS and HFS+ drivers share multiple structures of on-disk layout declarations. Some structures are used without any change. However, we had two independent declarations of the same structures in HFS and HFS+ drivers. The on-disk layout declarations have been moved into include/linux/hfs_common.h with the goal to exclude the declarations duplication and to keep the HFS/HFS+ on-disk layout declarations in one place. Also, this patch prepares the basis for creating a hfslib that can aggregate common functionality without necessity to duplicate the same code in HFS and HFS+ drivers. - HFS/HFS+ really need unit-tests because of multiple xfstests failures. The first two patches introduce Kunit-based unit-tests for the case string operations in HFS/HFS+ file system drivers" * tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfs/hfsplus: move on-disk layout declarations into hfs_common.h hfsplus: fix volume corruption issue for generic/101 hfsplus: introduce KUnit tests for HFS+ string operations hfs: introduce KUnit tests for HFS string operations hfsplus: fix volume corruption issue for generic/073 hfsplus: Verify inode mode when loading from disk hfsplus: fix volume corruption issue for generic/070 hfs/hfsplus: prevent getting negative values of offset/length hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create hfs: fix potential use after free in hfs_correct_next_unused_CNID()
5 days	Merge tag 'for-6.19-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features: - shutdown ioctl support (needs CONFIG_BTRFS_EXPERIMENTAL for now): - set filesystem state as being shut down (also named going down in other filesystems), where all active operations return EIO and this cannot be changed until unmount - pending operations are attempted to be finished but error messages may still show up depending on where exactly the shutdown happened - scrub (and device replace) vs suspend/hibernate: - a running scrub will prevent suspend, which can be annoying as suspend is an immediate request and scrub is not critical - filesystem freezing before suspend was not sufficient as the problem was in process freezing - behaviour change: on suspend scrub and device replace are cancelled, where scrub can record the last state and continue from there; the device replace has to be restarted from the beginning - zone stats exported in sysfs, from the perspective of the filesystem this includes active, reclaimable, relocation etc zones Performance: - improvements when processing space reservation tickets by optimizing locking and shrinking critical sections, cumulative improvements in lockstat numbers show +15% Notable fixes: - use vmalloc fallback when allocating bios as high order allocations can happen with wide checksums (like sha256) - scrub will always track the last position of progress so it's not starting from zero after an error Core: - under experimental config, checksum calculations are offloaded to process context, simplifies locking and allows to remove compression write worker kthread(s): - speed improvement in direct IO throughput with buffered IO fallback is +15% when not offloaded but this is more related to internal crypto subsystem improvements - this will be probably default in the future removing the sysfs tunable - (experimental) block size > page size updates: - support more operations when not using large folios (encoded read/write and send) - raid56 - more preparations for fscrypt support Other: - more conversions to auto-cleaned variables - parameter cleanups and removals - extended warning fixes - improved printing of structured values like keys - lots of other cleanups and refactoring" * tag 'for-6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits) btrfs: remove unnecessary inode key in btrfs_log_all_parents() btrfs: remove redundant zero/NULL initializations in btrfs_alloc_root() btrfs: remaining BTRFS_PATH_AUTO_FREE conversions btrfs: send: do not allocate memory for xattr data when checking it exists btrfs: send: add unlikely to all unexpected overflow checks btrfs: reduce arguments to btrfs_del_inode_ref_in_log() btrfs: remove root argument from btrfs_del_dir_entries_in_log() btrfs: use test_and_set_bit() in btrfs_delayed_delete_inode_ref() btrfs: don't search back for dir inode item in INO_LOOKUP_USER btrfs: don't rewrite ret from inode_permission btrfs: add orig_logical to btrfs_bio for encryption btrfs: disable verity on encrypted inodes btrfs: disable various operations on encrypted inodes btrfs: remove redundant level reset in btrfs_del_items() btrfs: simplify leaf traversal after path release in btrfs_next_old_leaf() btrfs: optimize balance_level() path reference handling btrfs: factor out root promotion logic into promote_child_to_root() btrfs: raid56: remove the "_step" infix btrfs: raid56: enable bs > ps support btrfs: raid56: prepare finish_parity_scrub() to support bs > ps cases ...
5 days	Merge tag 'for-6.19/block-20251201' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Fix head insertion for mq-deadline, a regression from when priority support was added - Series simplifying and improving the ublk user copy code - Various ublk related cleanups - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the request is punted to a thread for handling - Merge and then later revert loop dio nowait support, as it ended up causing excessive stack usage for when the inline issue code needs to dip back into the full file system code - Improve auto integrity code, making it less deadlock prone - Speedup polled IO handling, but manually managing the hctx lookups - Fixes for blk-throttle for SSD devices - Small series with fixes for the S390 dasd driver - Add support for caching zones, avoiding unnecessary report zone queries - MD pull requests via Yu: - fix null-ptr-dereference regression for dm-raid0 - fix IO hang for raid5 when array is broken with IO inflight - remove legacy 1s delay to speed up system shutdown - change maintainer's email address - data can be lost if array is created with different lbs devices, fix this problem and record lbs of the array in metadata - fix rcu protection for md_thread - fix mddev kobject lifetime regression - enable atomic writes for md-linear - some cleanups - bcache updates via Coly - remove useless discard and cache device code - improve usage of per-cpu workqueues - Reorganize the IO scheduler switching code, fixing some lockdep reports as well - Improve the block layer P2P DMA support - Add support to the block tracing code for zoned devices - Segment calculation improves, and memory alignment flexibility improvements - Set of prep and cleanups patches for ublk batching support. The actual batching hasn't been added yet, but helps shrink down the workload of getting that patchset ready for 6.20 - Fix for how the ps3 block driver handles segments offsets - Improve how block plugging handles batch tag allocations - nbd fixes for use-after-free of the configuration on device clear/put - Set of improvements and fixes for zloop - Add Damien as maintainer of the block zoned device code handling - Various other fixes and cleanups * tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) block/rnbd: correct all kernel-doc complaints blk-mq: use queue_hctx in blk_mq_map_queue_type md: remove legacy 1s delay in md_notify_reboot md/raid5: fix IO hang when array is broken with IO inflight md: warn about updating super block failure md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid sbitmap: fix all kernel-doc warnings ublk: add helper of __ublk_fetch() ublk: pass const pointer to ublk_queue_is_zoned() ublk: refactor auto buffer register in ublk_dispatch_req() ublk: add `union ublk_io_buf` with improved naming ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg() kfifo: add kfifo_alloc_node() helper for NUMA awareness blk-mq: fix potential uaf for 'queue_hw_ctx' blk-mq: use array manage hctx map instead of xarray ublk: prevent invalid access with DEBUG s390/dasd: Use scnprintf() instead of sprintf() s390/dasd: Move device name formatting into separate function s390/dasd: Remove unnecessary debugfs_create() return checks s390/dasd: Fix gendisk parent after copy pair swap ...
5 days	Merge tag 'for-6.19/io_uring-20251201' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Unify how task_work cancelations are detected, placing it in the task_work running state rather than needing to check the task state - Series cleaning up and moving the cancelation code to where it belongs, in cancel.c - Cleanup of waitid and futex argument handling - Add support for mixed sized SQEs. 6.18 added support for mixed sized CQEs, improving flexibility and efficiency of workloads that need big CQEs. This adds similar support for SQEs, where the occasional need for a 128b SQE doesn't necessitate having all SQEs be 128b in size - Introduce zcrx and SQ/CQ layout queries. The former returns what zcrx features are available. And both return the ring size information to help with allocation size calculation for user provided rings like IORING_SETUP_NO_MMAP and IORING_MEM_REGION_TYPE_USER - Zcrx updates for 6.19. It includes a bunch of small patches, IORING_REGISTER_ZCRX_CTRL and RQ flushing and David's work on sharing zcrx b/w multiple io_uring instances - Series cleaning up ring initializations, notable deduplicating ring size and offset calculations. It also moves most of the checking before doing any allocations, making the code simpler - Add support for getsockname and getpeername, which is mostly a trivial hookup after a bit of refactoring on the networking side - Various fixes and cleanups * tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring: Introduce getsockname io_uring cmd socket: Split out a getsockname helper for io_uring socket: Unify getsockname and getpeername implementation io_uring/query: drop unused io_handle_query_entry() ctx arg io_uring/kbuf: remove obsolete buf_nr_pages and update comments io_uring/register: use correct location for io_rings_layout io_uring/zcrx: share an ifq between rings io_uring/zcrx: add io_fill_zcrx_offsets() io_uring/zcrx: export zcrx via a file io_uring/zcrx: move io_zcrx_scrub() and dependencies up io_uring/zcrx: count zcrx users io_uring/zcrx: add sync refill queue flushing io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL io_uring/zcrx: elide passing msg flags io_uring/zcrx: use folio_nr_pages() instead of shift operation io_uring/zcrx: convert to use netmem_desc io_uring/query: introduce rings info query io_uring/query: introduce zcrx query io_uring: move cq/sq user offset init around io_uring: pre-calculate scq layout ...
5 days	Merge tag 'net-next-6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Replace busylock at the Tx queuing layer with a lockless list. Resulting in a 300% (4x) improvement on heavy TX workloads, sending twice the number of packets per second, for half the cpu cycles. - Allow constantly busy flows to migrate to a more suitable CPU/NIC queue. Normally we perform queue re-selection when flow comes out of idle, but under extreme circumstances the flows may be constantly busy. Add sysctl to allow periodic rehashing even if it'd risk packet reordering. - Optimize the NAPI skb cache, make it larger, use it in more paths. - Attempt returning Tx skbs to the originating CPU (like we already did for Rx skbs). - Various data structure layout and prefetch optimizations from Eric. - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly quite expensive on recent AMD machines. - Extend threaded NAPI polling to allow the kthread busy poll for packets. - Make MPTCP use Rx backlog processing. This lowers the lock pressure, improving the Rx performance. - Support memcg accounting of MPTCP socket memory. - Allow admin to opt sockets out of global protocol memory accounting (using a sysctl or BPF-based policy). The global limits are a poor fit for modern container workloads, where limits are imposed using cgroups. - Improve heuristics for when to kick off AF_UNIX garbage collection. - Allow users to control TCP SACK compression, and default to 33% of RTT. - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily aggressive rcvbuf growth and overshot when the connection RTT is low. - Preserve skb metadata space across skb_push / skb_pull operations. - Support for IPIP encapsulation in the nftables flowtable offload. - Support appending IP interface information to ICMP messages (RFC 5837). - Support setting max record size in TLS (RFC 8449). - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL. - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock. - Let users configure the number of write buffers in SMC. - Add new struct sockaddr_unsized for sockaddr of unknown length, from Kees. - Some conversions away from the crypto_ahash API, from Eric Biggers. - Some preparations for slimming down struct page. - YAML Netlink protocol spec for WireGuard. - Add a tool on top of YAML Netlink specs/lib for reporting commonly computed derived statistics and summarized system state. Driver API: - Add CAN XL support to the CAN Netlink interface. - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as defined by the OPEN Alliance's "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" specification. - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x). - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec and performs RSS. - Add info to devlink params whether the current setting is the default or a user override. Allow resetting back to default. - Add standard device stats for PSP crypto offload. - Leverage DSA frame broadcast to implement simple HSR frame duplication for a lot of switches without dedicated HSR offload. - Add uAPI defines for 1.6Tbps link modes. Device drivers: - Add Motorcomm YT921x gigabit Ethernet switch support. - Add MUCSE driver for N500/N210 1GbE NIC series. - Convert drivers to support dedicated ops for timestamping control, and away from the direct IOCTL handling. While at it support GET operations for PHY timestamping. - Add (and convert most drivers to) a dedicated ethtool callback for reading the Rx ring count. - Significant refactoring efforts in the STMMAC driver, which supports Synopsys turn-key MAC IP integrated into a ton of SoCs. - Ethernet high-speed NICs: - Broadcom (bnxt): - support PPS in/out on all pins - Intel (100G, ice, idpf): - ice: implement standard ethtool and timestamping stats - i40e: support setting the max number of MAC addresses per VF - iavf: support RSS of GTP tunnels for 5G and LTE deployments - nVidia/Mellanox (mlx5): - reduce downtime on interface reconfiguration - disable being an XDP redirect target by default (same as other drivers) to avoid wasting resources if feature is unused - Meta (fbnic): - add support for Linux-managed PCS on 25G, 50G, and 100G links - Wangxun: - support Rx descriptor merge, and Tx head writeback - support Rx coalescing offload - support 25G SPF and 40G QSFP modules - Ethernet virtual: - Google (gve): - allow ethtool to configure rx_buf_len - implement XDP HW RX Timestamping support for DQ descriptor format - Microsoft vNIC (mana): - support HW link state events - handle hardware recovery events when probing the device - Ethernet NICs consumer, and embedded: - usbnet: add support for Byte Queue Limits (BQL) - AMD (amd-xgbe): - add device selftests - NXP (enetc): - add i.MX94 support - Broadcom integrated MACs (bcmgenet, bcmasp): - bcmasp: add support for PHY-based Wake-on-LAN - Broadcom switches (b53): - support port isolation - support BCM5389/97/98 and BCM63XX ARL formats - Lantiq/MaxLinear switches: - support bridge FDB entries on the CPU port - use regmap for register access - allow user to enable/disable learning - support Energy Efficient Ethernet - support configuring RMII clock delays - add tagging driver for MaxLinear GSW1xx switches - Synopsys (stmmac): - support using the HW clock in free running mode - add Eswin EIC7700 support - add Rockchip RK3506 support - add Altera Agilex5 support - Cadence (macb): - cleanup and consolidate descriptor and DMA address handling - add EyeQ5 support - TI: - icssg-prueth: support AF_XDP - Airoha access points: - add missing Ethernet stats and link state callback - add AN7583 support - support out-of-order Tx completion processing - Power over Ethernet: - pd692x0: preserve PSE configuration across reboots - add support for TPS23881B devices - Ethernet PHYs: - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support - Support 50G SerDes and 100G interfaces in Linux-managed PHYs - micrel: - support for non PTP SKUs of lan8814 - enable in-band auto-negotiation on lan8814 - realtek: - cable testing support on RTL8224 - interrupt support on RTL8221B - motorcomm: support for PHY LEDs on YT853 - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag - mscc: support for PHY LED control - CAN drivers: - m_can: add support for optional reset and system wake up - remove can_change_mtu() obsoleted by core handling - mcp251xfd: support GPIO controller functionality - Bluetooth: - add initial support for PASTa - WiFi: - split ieee80211.h file, it's way too big - improvements in VHT radiotap reporting, S1G, Channel Switch Announcement handling, rate tracking in mesh networks - improve multi-radio monitor mode support, and add a cfg80211 debugfs interface for it - HT action frame handling on 6 GHz - initial chanctx work towards NAN - MU-MIMO sniffer improvements - WiFi drivers: - RealTek (rtw89): - support USB devices RTL8852AU and RTL8852CU - initial work for RTL8922DE - improved injection support - Intel: - iwlwifi: new sniffer API support - MediaTek (mt76): - WED support for >32-bit DMA - airoha NPU support - regdomain improvements - continued WiFi7/MLO work - Qualcomm/Atheros: - ath10k: factory test support - ath11k: TX power insertion support - ath12k: BSS color change support - ath12k: statistics improvements - brcmfmac: Acer A1 840 tablet quirk - rtl8xxxu: 40 MHz connection fixes/support" * tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits) net: page_pool: sanitise allocation order net: page pool: xa init with destroy on pp init net/mlx5e: Support XDP target xmit with dummy program net/mlx5e: Update XDP features in switch channels selftests/tc-testing: Test CAKE scheduler when enqueue drops packets net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop wireguard: netlink: generate netlink code wireguard: uapi: generate header with ynl-gen wireguard: uapi: move flag enums wireguard: uapi: move enum wg_cmd wireguard: netlink: add YNL specification selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py selftests: drv-net: introduce Iperf3Runner for measurement use cases selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive() Documentation: net: dsa: mention simple HSR offload helpers Documentation: net: dsa: mention availability of RedBox ...
5 days	Merge tag 'kbuild-6.19-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild updates from Nicolas Schier: - Enable -fms-extensions, allowing anonymous use of tagged struct or union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary conversion patch is added here, too (btrfs). [ Editor's note: the core of this actually came in early through a shared branch and a few other trees - Linus ] - Introduce architecture-specific CC_CAN_LINK and flags for userprogs - Add new packaging target 'modules-cpio-pkg' for building a initramfs cpio w/ kmods - Handle included .c files in gen_compile_commands - Minor kbuild changes: - Use objtree for module signing key path, fixing oot kmod signing - Improve documentation of KBUILD_BUILD_TIMESTAMP - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice - Rename scripts/Makefile.extrawarn to Makefile.warn - Drop obsolete types.h check from headers_check.pl - Remove outdated config leak ignore entries * tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: add target to build a cpio containing modules initramfs: add gen_init_cpio to hostprogs unconditionally kbuild: allow architectures to override CC_CAN_LINK init: deduplicate cc-can-link.sh invocations kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings scripts: headers_install.sh: Remove two outdated config leak ignore entries scripts/clang-tools: Handle included .c files in gen_compile_commands kbuild: uapi: Drop types.h check from headers_check.pl kbuild: Rename Makefile.extrawarn to Makefile.warn MAINTAINERS, .mailmap: Update mail address for Nicolas Schier kbuild: uapi: reuse KBUILD_USERCFLAGS kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation kbuild: Use objtree for module signing key path btrfs: send: make use of -fms-extensions for defining struct fs_path
5 days	Merge tag 'printk-for-6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Allow creaing nbcon console drivers with an unsafe write_atomic() callback that can only be called by the final nbcon_atomic_flush_unsafe(). Otherwise, the driver would rely on the kthread. It is going to be used as the-best-effort approach for an experimental nbcon netconsole driver, see https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org Note that a safe .write_atomic() callback is supposed to work in NMI context. But some networking drivers are not safe even in IRQ context: https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35 In an ideal world, all networking drivers would be fixed first and the atomic flush would be blocked only in NMI context. But it brings the question how reliable networking drivers are when the system is in a bad state. They might block flushing more reliable serial consoles which are more suitable for serious debugging anyway. - Allow to use the last 4 bytes of the printk ring buffer. - Prevent queuing IRQ work and block printk kthreads when consoles are suspended. Otherwise, they create non-necessary churn or even block the suspend. - Release console_lock() between each record in the kthread used for legacy consoles on RT. It might significantly speed up the boot. - Release nbcon context between each record in the atomic flush. It prevents stalls of the related printk kthread after it has lost the ownership in the middle of a record - Add support for NBCON consoles into KDB - Add %ptsP modifier for printing struct timespec64 and use it where possible - Misc code clean up * tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits) printk: Use console_is_usable on console_unblank arch: um: kmsg_dump: Use console_is_usable drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT lib/vsprintf: Unify FORMAT_STATE_NUM handlers printk: Avoid irq_work for printk_deferred() on suspend printk: Avoid scheduling irq_work on suspend printk: Allow printk_trigger_flush() to flush all types tracing: Switch to use %ptSp scsi: snic: Switch to use %ptSp scsi: fnic: Switch to use %ptSp s390/dasd: Switch to use %ptSp ptp: ocp: Switch to use %ptSp pps: Switch to use %ptSp PCI: epf-test: Switch to use %ptSp net: dsa: sja1105: Switch to use %ptSp mmc: mmc_test: Switch to use %ptSp media: av7110: Switch to use %ptSp ipmi: Switch to use %ptSp igb: Switch to use %ptSp e1000e: Switch to use %ptSp ...
5 days	fs: assert on I_FREEING not being set in iput() and iput_not_last()	Mateusz Guzik
	Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251201132037.22835-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
5 days	fs: PM: Fix reverse check in filesystems_freeze_callback()	Rafael J. Wysocki
	The freeze_all_ptr check in filesystems_freeze_callback() introduced by commit a3f8f8662771 ("power: always freeze efivarfs") is reverse which quite confusingly causes all file systems to be frozen when filesystem_freeze_enabled is false. On my systems it causes the WARN_ON_ONCE() in __set_task_frozen() to trigger, most likely due to an attempt to freeze a file system that is not ready for that. Add a logical negation to the check in question to reverse it as appropriate. Fixes: a3f8f8662771 ("power: always freeze efivarfs") Cc: 6.18+ <stable@vger.kernel.org> # 6.18+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12788397.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Christian Brauner <brauner@kernel.org>
6 days	exfat: fix remount failure in different process environments	Yuezhang Mo
	The kernel test robot reported that the exFAT remount operation failed. The reason for the failure was that the process's umask is different between mount and remount, causing fs_fmask and fs_dmask are changed. Potentially, both gid and uid may also be changed. Therefore, when initializing fs_context for remount, inherit these mount options from the options used during mount. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
6 days	exfat: fix divide-by-zero in exfat_allocate_bitmap	Namjae Jeon
	The variable max_ra_count can be 0 in exfat_allocate_bitmap(), which causes a divide-by-zero error in the subsequent modulo operation (i % max_ra_count), leading to a system crash. When max_ra_count is 0, it means that readahead is not used. This patch load the bitmap without readahead. Fixes: 9fd688678dd8 ("exfat: optimize allocation bitmap loading time") Reported-by: Jiaming Zhang <r772577952@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
6 days	exfat: validate the cluster bitmap bits of directory	Namjae Jeon
	Syzbot created this issue by testing an image that did not have the root cluster bitmap bit marked. After accessing a file through the root directory via exfat_lookup, when creating a file again with mkdir, the root cluster bit can be allocated for direcotry, which can cause the root cluster to be zeroed out and the same entry can be allocated in the same cluster. This patch improved this issue by adding exfat_test_bitmap to validate the cluster bits of the root directory and directory. And the first cluster bit of the root directory should never be unset except when storage is corrupted. This bit is set to allow operations after mount. Reported-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com Tested-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
6 days	exfat: zero out post-EOF page cache on file extension	Yuezhang Mo
	xfstests generic/363 was failing due to unzeroed post-EOF page cache that allowed mmap writes beyond EOF to become visible after file extension. For example, in following xfs_io sequence, 0x22 should not be written to the file but would become visible after the extension: xfs_io -f -t -c "pwrite -S 0x11 0 8" \ -c "mmap 0 4096" \ -c "mwrite -S 0x22 32 32" \ -c "munmap" \ -c "pwrite -S 0x33 512 32" \ $testfile This violates the expected behavior where writes beyond EOF via mmap should not persist after the file is extended. Instead, the extended region should contain zeros. Fix this by using truncate_pagecache() to truncate the page cache after the current EOF when extending the file. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
6 days	exfat: fix refcount leak in exfat_find	Shuhao Fu
	Fix refcount leaks in `exfat_find` related to `exfat_get_dentry_set`. Function `exfat_get_dentry_set` would increase the reference counter of `es->bh` on success. Therefore, `exfat_put_dentry_set` must be called after `exfat_get_dentry_set` to ensure refcount consistency. This patch relocate two checks to avoid possible leaks. Fixes: 82ebecdc74ff ("exfat: fix improper check of dentry.stream.valid_size") Fixes: 13940cef9549 ("exfat: add a check for invalid data size") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
6 days	Merge tag 'x86_cache_for_v6.19_rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
6 days	Merge tag 'core-rseq-2025-11-30' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...
6 days	gfs2: Fix use of bio_chain	Andreas Gruenbacher
	In gfs2_chain_bio(), the call to bio_chain() has its arguments swapped. The result is leaked bios and incorrect synchronization (only the last bio will actually be waited for). This code is only used during mount and filesystem thaw, so the bug normally won't be noticeable. Reported-by: Stephen Zhang <starzhangzsd@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
6 days	Merge tag 'core-uaccess-2025-11-30' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user..access(). This also corrects a couple of imbalanced masked__begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe__user() ARM: uaccess: Implement missing __get_user_asm_dword()
6 days	debugfs: Fix default access mode config check	Aaron Thompson
	This typo caused debugfs to always behave as if CONFIG_DEBUG_FS_ALLOW_NONE was selected. Fixes: f278809475f6 ("debugfs: Remove broken no-mount mode") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://patch.msgid.link/20251202070927.14198-1-dev@null.aaront.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 days	Merge tag 'locking-core-2025-12-01' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Mutexes: - Redo __mutex_init() to reduce generated code size (Sebastian Andrzej Siewior) Seqlocks: - Introduce scoped_seqlock_read() (Peter Zijlstra) - Change thread_group_cputime() to use scoped_seqlock_read() (Oleg Nesterov) - Change do_task_stat() to use scoped_seqlock_read() (Oleg Nesterov) - Change do_io_accounting() to use scoped_seqlock_read() (Oleg Nesterov) - Fix the incorrect documentation of read_seqbegin_or_lock() / need_seqretry() (Oleg Nesterov) - Allow KASAN to fail optimizing (Peter Zijlstra) Local lock updates: - Fix all kernel-doc warnings (Randy Dunlap) - Add the <linux/local_lock.h> headers to MAINTAINERS (Sebastian Andrzej Siewior) - Reduce the risk of shadowing via s/l/__l/ and s/tl/__tl/ (Vincent Mailhol) Lock debugging: - spinlock/debug: Fix data-race in do_raw_write_lock (Alexander Sverdlin) Atomic primitives infrastructure: - atomic: Skip alignment check for try_cmpxchg() old arg (Arnd Bergmann) Rust runtime integration: - sync: atomic: Enable generated Atomic<T> usage (Boqun Feng) - sync: atomic: Implement Debug for Atomic<Debug> (Boqun Feng) - debugfs: Remove Rust native atomics and replace them with Linux versions (Boqun Feng) - debugfs: Implement Reader for Mutex<T> only when T is Unpin (Boqun Feng) - lock: guard: Add T: Unpin bound to DerefMut (Daniel Almeida) - lock: Pin the inner data (Daniel Almeida) - lock: Add a Pin<&mut T> accessor (Daniel Almeida)" tag 'locking-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/local_lock: Fix all kernel-doc warnings locking/local_lock: s/l/__l/ and s/tl/__tl/ to reduce the risk of shadowing locking/local_lock: Add the <linux/local_lock.h> headers to MAINTAINERS locking/mutex: Redo __mutex_init() to reduce generated code size rust: debugfs: Replace the usage of Rust native atomics rust: sync: atomic: Implement Debug for Atomic<Debug> rust: sync: atomic: Make AtomicOps pub(crate) seqlock: Allow KASAN to fail optimizing rust: debugfs: Implement Reader for Mutex<T> only when T is Unpin seqlock: Change do_io_accounting() to use scoped_seqlock_read() seqlock: Change do_task_stat() to use scoped_seqlock_read() seqlock: Change thread_group_cputime() to use scoped_seqlock_read() seqlock: Introduce scoped_seqlock_read() documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry atomic: Skip alignment check for try_cmpxchg() old arg rust: lock: Add a Pin<&mut T> accessor rust: lock: Pin the inner data rust: lock: guard: Add T: Unpin bound to DerefMut locking/spinlock/debug: Fix data-race in do_raw_write_lock
7 days	Merge tag 'vfs-6.19-rc1.fd_prepare.fs' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fd prepare updates from Christian Brauner: "This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the common pattern of get_unused_fd_flags() + create file + fd_install() that is used extensively throughout the kernel and currently requires cumbersome cleanup paths. FD_ADD() - For simple cases where a file is installed immediately: fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device)); if (fd < 0) vfio_device_put_registration(device); return fd; FD_PREPARE() - For cases requiring access to the fd or file, or additional work before publishing: FD_PREPARE(fdf, O_CLOEXEC, sync_file->file); if (fdf.err) { fput(sync_file->file); return fdf.err; } data.fence = fd_prepare_fd(fdf); if (copy_to_user((void __user )arg, &data, sizeof(data))) return -EFAULT; return fd_publish(fdf); The primitives are centered around struct fd_prepare. FD_PREPARE() encapsulates all allocation and cleanup logic and must be followed by a call to fd_publish() which associates the fd with the file and installs it into the caller's fdtable. If fd_publish() isn't called, both are deallocated automatically. FD_ADD() is a shorthand that does fd_publish() immediately and never exposes the struct to the caller. I've implemented this in a way that it's compatible with the cleanup infrastructure while also being usable separately. IOW, it's centered around struct fd_prepare which is aliased to class_fd_prepare_t and so we can make use of all the basica guard infrastructure" tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) io_uring: convert io_create_mock_file() to FD_PREPARE() file: convert replace_fd() to FD_PREPARE() vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD() tty: convert ptm_open_peer() to FD_ADD() ntsync: convert ntsync_obj_get_fd() to FD_PREPARE() media: convert media_request_alloc() to FD_PREPARE() hv: convert mshv_ioctl_create_partition() to FD_ADD() gpio: convert linehandle_create() to FD_PREPARE() pseries: port papr_rtas_setup_file_interface() to FD_ADD() pseries: convert papr_platform_dump_create_handle() to FD_ADD() spufs: convert spufs_gang_open() to FD_PREPARE() papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE() spufs: convert spufs_context_open() to FD_PREPARE() net/socket: convert __sys_accept4_file() to FD_ADD() net/socket: convert sock_map_fd() to FD_ADD() net/kcm: convert kcm_ioctl() to FD_PREPARE() net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE() secretmem: convert memfd_secret() to FD_ADD() memfd: convert memfd_create() to FD_ADD() bpf: convert bpf_token_create() to FD_PREPARE() ...
7 days	Merge tag 'vfs-6.19-rc1.autofs' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull autofs update from Christian Brauner: "Prevent futile mount triggers in private mount namespaces. Fix a problematic loop in autofs when a mount namespace contains autofs mounts that are propagation private and there is no namespace-specific automount daemon to handle possible automounting. Previously, attempted path resolution would loop until MAXSYMLINKS was reached before failing, causing significant noise in the log. The fix adds a check in autofs ->d_automount() so that the VFS can immediately return EPERM in this case. Since the mount is propagation private, EPERM is the most appropriate error code" * tag 'vfs-6.19-rc1.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: autofs: dont trigger mount if it cant succeed
7 days	Merge tag 'vfs-6.19-rc1.ovl' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs cred guard conversion from Christian Brauner: "This converts all of overlayfs to use credential guards, eliminating manual credential management throughout the filesystem. Credential guard conversion: - Convert all of overlayfs to use credential guards, replacing the manual ovl_override_creds()/ovl_revert_creds() pattern with scoped guards. This makes credential handling visually explicit and eliminates a class of potential bugs from mismatched override/revert calls. (1) Basic credential guard (with_ovl_creds) (2) Creator credential guard (ovl_override_creator_creds): Introduced a specialized guard for file creation operations that handles the two-phase credential override (mounter credentials, then fs{g,u}id override). The new pattern is much clearer: with_ovl_creds(dentry->d_sb) { scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) { if (IS_ERR(cred)) return PTR_ERR(cred); /* creation operations / } } (3) Copy-up credential guard (ovl_cu_creds): Introduced a specialized guard for copy-up operations, simplifying the previous struct ovl_cu_creds helper and associated functions. Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this pattern. Cleanups: - Remove ovl_revert_creds() after all callers converted to guards - Remove struct ovl_cu_creds and associated functions - Drop ovl_setup_cred_for_create() after conversion - Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(), ovl_rename() for cleaner credential guard scope - Introduce struct ovl_renamedata to simplify rename handling - Don't override credentials for ovl_check_whiteouts() (unnecessary) - Remove unneeded semicolon" tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits) ovl: remove unneeded semicolon ovl: remove struct ovl_cu_creds and associated functions ovl: port ovl_copy_up_tmpfile() to cred guard ovl: mark *_cu_creds() as unused temporarily ovl: port ovl_copy_up_workdir() to cred guard ovl: add copy up credential guard ovl: drop ovl_setup_cred_for_create() ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard ovl: mark ovl_setup_cred_for_create() as unused temporarily ovl: reflow ovl_create_or_link() ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard ovl: add ovl_override_creator_creds cred guard ovl: remove ovl_revert_creds() ovl: port ovl_fill_super() to cred guard ovl: refactor ovl_fill_super() ovl: port ovl_lower_positive() to cred guard ovl: port ovl_lookup() to cred guard ovl: refactor ovl_lookup() ovl: port ovl_copyfile() to cred guard ovl: port ovl_rename() to cred guard ...
7 days	Merge tag 'vfs-6.19-rc1.directory.locking' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()
7 days	Merge tag 'vfs-6.19-rc1.directory.delegations' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory delegations update from Christian Brauner: "This contains the work for recall-only directory delegations for knfsd. Add support for simple, recallable-only directory delegations. This was decided at the fall NFS Bakeathon where the NFS client and server maintainers discussed how to merge directory delegation support. The approach starts with recallable-only delegations for several reasons: 1. RFC8881 has gaps that are being addressed in RFC8881bis. In particular, it requires directory position information for CB_NOTIFY callbacks, which is difficult to implement properly under Linux. The spec is being extended to allow that information to be omitted. 2. Client-side support for CB_NOTIFY still lags. The client side involves heuristics about when to request a delegation. 3. Early indication shows simple, recallable-only delegations can help performance. Anna Schumaker mentioned seeing a multi-minute speedup in xfstests runs with them enabled. With these changes, userspace can also request a read lease on a directory that will be recalled on conflicting accesses. This may be useful for applications like Samba. Users can disable leases altogether via the fs.leases-enable sysctl if needed. VFS changes: - Dedicated Type for Delegations Introduce struct delegated_inode to track inodes that may have delegations that need to be broken. This replaces the previous approach of passing raw inode pointers through the delegation breaking code paths, providing better type safety and clearer semantics for the delegation machinery. - Break parent directory delegations in open(..., O_CREAT) codepath - Allow mkdir to wait for delegation break on parent - Allow rmdir to wait for delegation break on parent - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(), and vfs_unlink() - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations on parent directory - Clean up argument list for vfs_create() - Expose delegation support to userland Filelock changes: - Make lease_alloc() take a flags argument - Rework the __break_lease API to use flags - Add struct delegated_inode - Push the S_ISREG check down to ->setlease handlers - Lift the ban on directory leases in generic_setlease NFSD changes: - Allow filecache to hold S_IFDIR files - Allow DELEGRETURN on directories - Wire up GET_DIR_DELEGATION handling Fixes: - Fix kernel-doc warnings in __fcntl_getlease - Add needed headers for new struct delegation definition" * tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: add needed headers for new struct delegation definition filelock: __fcntl_getlease: fix kernel-doc warnings vfs: expose delegation support to userland nfsd: wire up GET_DIR_DELEGATION handling nfsd: allow DELEGRETURN on directories nfsd: allow filecache to hold S_IFDIR files filelock: lift the ban on directory leases in generic_setlease vfs: make vfs_symlink break delegations on parent dir vfs: make vfs_mknod break delegations on parent directory vfs: make vfs_create break delegations on parent directory vfs: clean up argument list for vfs_create() vfs: break parent dir delegations in open(..., O_CREAT) codepath vfs: allow rmdir to wait for delegation break on parent vfs: allow mkdir to wait for delegation break on parent vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink} filelock: push the S_ISREG check down to ->setlease handlers filelock: add struct delegated_inode filelock: rework the __break_lease API to use flags filelock: make lease_alloc() take a flags argument
7 days	Merge tag 'vfs-6.19-rc1.minix' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull minix fixes from Christian Brauner: "Fix two syzbot corruption bugs in the minix filesystem. Syzbot fuzzes filesystems by trying to mount and manipulate deliberately corrupted images. This should not lead to BUG_ONs and WARN_ONs for easy to detect corruptions. - Add error handling to minix filesystem for inode corruption detection, enabling the filesystem to report such corruptions cleanly. - Fix a drop_nlink warning in minix_rmdir() triggered by corrupted directory link counts. - Fix a drop_nlink warning in minix_rename() triggered by corrupted inode link counts" * tag 'vfs-6.19-rc1.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Fix a drop_nlink warning in minix_rename Fix a drop_nlink warning in minix_rmdir Add error handling to minix filesystem for inode corruption detection
7 days	Merge tag 'vfs-6.19-rc1.guards' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock lock guard updates from Christian Brauner: "This starts the work of introducing guards for superblock related locks. Introduce super_write_guard for scoped superblock write protection. This provides a guard-based alternative to the manual sb_start_write() and sb_end_write() pattern, allowing the compiler to automatically handle the cleanup" * tag 'vfs-6.19-rc1.guards' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: use super write guard in xfs_file_ioctl() open: use super write guard in do_ftruncate() btrfs: use super write guard in relocating_repair_kthread() ext4: use super write guard in write_mmp_block() btrfs: use super write guard in sb_start_write() btrfs: use super write guard btrfs_run_defrag_inode() btrfs: use super write guard in btrfs_reclaim_bgs_work() fs: add super_write_guard
7 days	Merge tag 'vfs-6.19-rc1.fs_header' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fs header updates from Christian Brauner: "This contains initial work to start splitting up fs.h. Begin the long-overdue work of splitting up the monolithic fs.h header. The header has grown to over 3000 lines and includes types and functions for many different subsystems, making it difficult to navigate and causing excessive compilation dependencies. This series introduces new focused headers for superblock-related code: - Rename fs_types.h to fs_dirent.h to better reflect its actual content (directory entry types) - Add fs/super_types.h containing superblock type definitions - Add fs/super.h containing superblock function declarations This is the first step in a longer effort to modularize the VFS headers. Cleanups: - Inode Field Layout Optimization (Mateusz Guzik) Move inode fields used during fast path lookup closer together to improve cache locality during path resolution. - current_umask() Optimization (Mateusz Guzik) Inline current_umask() and move it to fs_struct.h. This improves performance by avoiding function call overhead for this frequently-used function, and places it in a more appropriate header since it operates on fs_struct" * tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: move inode fields used during fast path lookup closer together fs: inline current_umask() and move it to fs_struct.h fs: add fs/super.h header fs: add fs/super_types.h header fs: rename fs_types.h to fs_dirent.h
7 days	Merge tag 'kernel-6.19-rc1.cred' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred guard updates from Christian Brauner: "This contains substantial credential infrastructure improvements adding guard-based credential management that simplifies code and eliminates manual reference counting in many subsystems. Features: - Kernel Credential Guards Add with_kernel_creds() and scoped_with_kernel_creds() guards that allow using the kernel credentials without allocating and copying them. This was requested by Linus after seeing repeated prepare_kernel_creds() calls that duplicate the kernel credentials only to drop them again later. The new guards completely avoid the allocation and never expose the temporary variable to hold the kernel credentials anywhere in callers. - Generic Credential Guards Add scoped_with_creds() guards for the common override_creds() and revert_creds() pattern. This builds on earlier work that made override_creds()/revert_creds() completely reference count free. - Prepare Credential Guards Add prepare credential guards for the more complex pattern of preparing a new set of credentials and overriding the current credentials with them: - prepare_creds() - modify new creds - override_creds() - revert_creds() - put_cred() Cleanups: - Make init_cred static since it should not be directly accessed - Add kernel_cred() helper to properly access the kernel credentials - Fix scoped_class() macro that was introduced two cycles ago - coredump: split out do_coredump() from vfs_coredump() for cleaner credential handling - coredump: move revert_cred() before coredump_cleanup() - coredump: mark struct mm_struct as const - coredump: pass struct linux_binfmt as const - sev-dev: use guard for path" * tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) trace: use override credential guard trace: use prepare credential guard coredump: use override credential guard coredump: use prepare credential guard coredump: split out do_coredump() from vfs_coredump() coredump: mark struct mm_struct as const coredump: pass struct linux_binfmt as const coredump: move revert_cred() before coredump_cleanup() sev-dev: use override credential guards sev-dev: use prepare credential guard sev-dev: use guard for path cred: add prepare credential guard net/dns_resolver: use credential guards in dns_query() cgroup: use credential guards in cgroup_attach_permissions() act: use credential guards in acct_write_process() smb: use credential guards in cifs_get_spnego_key() nfs: use credential guards in nfs_idmap_get_key() nfs: use credential guards in nfs_local_call_write() nfs: use credential guards in nfs_local_call_read() erofs: use credential guards ...
7 days	Merge tag 'vfs-6.19-rc1.folio' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull folio updates from Christian Brauner: "Add a new folio_next_pos() helper function that returns the file position of the first byte after the current folio. This is a common operation in filesystems when needing to know the end of the current folio. The helper is lifted from btrfs which already had its own version, and is now used across multiple filesystems and subsystems: - btrfs - buffer - ext4 - f2fs - gfs2 - iomap - netfs - xfs - mm This fixes a long-standing bug in ocfs2 on 32-bit systems with files larger than 2GiB. Presumably this is not a common configuration, but the fix is backported anyway. The other filesystems did not have bugs, they were just mildly inefficient. This also introduce uoff_t as the unsigned version of loff_t. A recent commit inadvertently changed a comparison from being unsigned (on 64-bit systems) to being signed (which it had always been on 32-bit systems), leading to sporadic fstests failures. Generally file sizes are restricted to being a signed integer, but in places where -1 is passed to indicate "up to the end of the file", it is convenient to have an unsigned type to ensure comparisons are always unsigned regardless of architecture" * tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Add uoff_t mm: Use folio_next_pos() xfs: Use folio_next_pos() netfs: Use folio_next_pos() iomap: Use folio_next_pos() gfs2: Use folio_next_pos() f2fs: Use folio_next_pos() ext4: Use folio_next_pos() buffer: Use folio_next_pos() btrfs: Use folio_next_pos() filemap: Add folio_next_pos()