summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
8 daysksmbd: Use SHA-512 library for SMB3.1.1 preauth hashEric Biggers
Convert ksmbd_gen_preauth_integrity_hash() to use the SHA-512 library instead of a "sha512" crypto_shash. This is simpler and faster. With the library there's no need to allocate memory, no need to handle errors, and the SHA-512 code is accessed directly without inefficient indirect calls and other unnecessary API overhead. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
9 dayserofs: get rid of raw bi_end_io() usageGao Xiang
These BIOs are actually harmless in practice, as they are all pseudo BIOs and do not use advanced features like chaining. Using the BIO interface is a more friendly and unified approach for both bdev and and file-backed I/Os (compared to awkward bvec interfaces). Let's use bio_endio() instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
9 dayserofs: enable error reporting for z_erofs_fixup_insize()Gao Xiang
Enable propagation of detailed errors to callers. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
9 daysfs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handlingLorenzo Stoakes
make_uffd_wp_huge_pte() should return after handling a huge_pte_none() pte. Link: https://lkml.kernel.org/r/66178124-ebdf-4e23-b8ca-ed3eb8030c81@lucifer.local Fixes: 03bfbc3ad6e4 ("mm: remove is_hugetlb_entry_[migration, hwpoisoned]()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Closes: https://lkml.kernel.org/r/dc483db3-be4d-45f7-8b40-a28f5d8f5738@suse.cz Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daysmm: declare VMA flags by bitLorenzo Stoakes
Patch series "initial work on making VMA flags a bitmap", v3. We are in the rather silly situation that we are running out of VMA flags as they are currently limited to a system word in size. This leads to absurd situations where we limit features to 64-bit architectures only because we simply do not have the ability to add a flag for 32-bit ones. This is very constraining and leads to hacks or, in the worst case, simply an inability to implement features we want for entirely arbitrary reasons. This also of course gives us something of a Y2K type situation in mm where we might eventually exhaust all of the VMA flags even on 64-bit systems. This series lays the groundwork for getting away from this limitation by establishing VMA flags as a bitmap whose size we can increase in future beyond 64 bits if required. This is necessarily a highly iterative process given the extensive use of VMA flags throughout the kernel, so we start by performing basic steps. Firstly, we declare VMA flags by bit number rather than by value, retaining the VM_xxx fields but in terms of these newly introduced VMA_xxx_BIT fields. While we are here, we use sparse annotations to ensure that, when dealing with VMA bit number parameters, we cannot be passed values which are not declared as such - providing some useful type safety. We then introduce an opaque VMA flag type, much like the opaque mm_struct flag type introduced in commit bb6525f2f8c4 ("mm: add bitmap mm->flags field"), which we establish in union with vma->vm_flags (but still set at system word size meaning there is no functional or data type size change). We update the vm_flags_xxx() helpers to use this new bitmap, introducing sensible helpers to do so. This series lays the foundation for further work to expand the use of bitmap VMA flags and eventually eliminate these arbitrary restrictions. This patch (of 4): In order to lay the groundwork for VMA flags being a bitmap rather than a system word in size, we need to be able to consistently refer to VMA flags by bit number rather than value. Take this opportunity to do so in an enum which we which is additionally useful for tooling to extract metadata from. This additionally makes it very clear which bits are being used for what at a glance. We use the VMA_ prefix for the bit values as it is logical to do so since these reference VMAs. We consistently suffix with _BIT to make it clear what the values refer to. We declare bit values even when the flags that use them would not be enabled by config options as this is simply clearer and clearly defines what bit numbers are used for what, at no additional cost. We declare a sparse-bitwise type vma_flag_t which ensures that users can't pass around invalid VMA flags by accident and prepares for future work towards VMA flags being a bitmap where we want to ensure bit values are type safe. To make life easier, we declare some macro helpers - DECLARE_VMA_BIT() allows us to avoid duplication in the enum bit number declarations (and maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to assist with declaration of flags. Unfortunately we can't declare both in the enum, as we run into issue with logic in the kernel requiring that flags are preprocessor definitions, and additionally we cannot have a macro which declares another macro so we must define each flag macro directly. Additionally, update the VMA userland testing vma_internal.h header to include these changes. We also have to fix the parameters to the vma_flag_*_atomic() functions since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will complain otherwise. We have to update some rather silly if-deffery found in mm/task_mmu.c which would otherwise break. Finally, we update the rust binding helper as now it cannot auto-detect the flags at all. Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Alice Ryhl <aliceryhl@google.com> [rust] Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Ben Segall <bsegall@google.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Gary Guo <gary@garyguo.net> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kees Cook <kees@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
10 daysext4: mark inodes without acls in __ext4_iget()Jan Kara
Mark inodes without acls with cache_no_acl() in __ext4_iget() so that path lookup can run in RCU mode from the start. This is interesting in particular for the case where the file owner does the lookup because in that case end up constantly hitting the slow path otherwise. We drop out from the fast path (because ACL state is unknown) but never end up calling check_acl() to cache ACL state. The problem was originally analyzed by Linus and fix tested by Matheusz, I'm just putting it into mergeable form :). Link: https://lore.kernel.org/all/CAHk-=whSzc75TLLPWskV0xuaHR4tpWBr=LduqhcCFr4kCmme_w@mail.gmail.com Reported-by: Mateusz Guzik <mjguzik@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Message-ID: <20251125101340.24276-2-jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: enable block size larger than page sizeBaokun Li
Since block device (See commit 3c20917120ce ("block/bdev: enable large folio support for large logical block sizes")) and page cache (See commit ab95d23bab220ef8 ("filemap: allocate mapping_min_order folios in the page cache")) has the ability to have a minimum order when allocating folio, and ext4 has supported large folio in commit 7ac67301e82f ("ext4: enable large folio for regular file"), now add support for block_size > PAGE_SIZE in ext4. set_blocksize() -> bdev_validate_blocksize() already validates the block size, so ext4_load_super() does not need to perform additional checks. Here we only need to add the FS_LBS bit to fs_flags. In addition, block sizes larger than the page size are currently supported only when CONFIG_TRANSPARENT_HUGEPAGE is enabled. To make this explicit, a blocksize_gt_pagesize entry has been added under /sys/fs/ext4/feature/, indicating whether bs > ps is supported. This allows mke2fs to check the interface and determine whether a warning should be issued when formatting a filesystem with block size larger than the page size. Suggested-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-25-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: add checks for large folio incompatibilities when BS > PSBaokun Li
Supporting a block size greater than the page size (BS > PS) requires support for large folios. However, several features (e.g., encrypt) do not yet support large folios. To prevent conflicts, this patch adds checks at mount time to prohibit these features from being used when BS > PS. Since these features cannot be changed on remount, there is no need to check on remount. This patch adds s_max_folio_order, initialized during mount according to filesystem features and mount options. If s_max_folio_order is 0, large folios are disabled. With this in place, ext4_set_inode_mapping_order() can be simplified by checking s_max_folio_order, avoiding redundant checks. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-24-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support verifying data from large folios with fs-verityBaokun Li
Eric Biggers already added support for verifying data from large folios several years ago in commit 5d0f0e57ed90 ("fsverity: support verifying data from large folios"). With ext4 now supporting large block sizes, the fs-verity tests `kvm-xfstests -c ext4/64k -g verity -x encrypt` pass without issues. Therefore, remove the restriction and allow large folios to be enabled together with fs-verity. Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-23-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: make data=journal support large block sizeBaokun Li
Currently, ext4_set_inode_mapping_order() does not set max folio order for files with the data journalling flag. For files that already have large folios enabled, ext4_inode_journal_mode() ignores the data journalling flag once max folio order is set. This is not because data journalling cannot work with large folios, but because credit estimates will go through the roof if there are too many blocks per folio. Since the real constraint is blocks-per-folio, to support data=journal under LBS, we now set max folio order to be equal to min folio order for files with the journalling flag. When LBS is disabled, the max folio order remains unset as before. Therefore, before ext4_change_inode_journal_flag() switches the journalling mode, we call truncate_pagecache() to drop all page cache for that inode, and filemap_write_and_wait() is called unconditionally. After that, once the journalling mode has been switched, we can safely reset the inode mapping order, and the mapping_large_folio_support() check in ext4_inode_journal_mode() can be removed. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-22-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in __ext4_block_zero_page_range()Zhihao Cheng
Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid negative left shifts after supporting blocksize greater than PAGE_SIZE. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-21-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in mpage_prepare_extent_to_map()Baokun Li
Use the EXT4_PG_TO_LBLK/EXT4_LBLK_TO_PG macros to complete the conversion between folio indexes and blocks to avoid negative left/right shifts after supporting blocksize greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-20-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in mpage_map_and_submit_buffers()Baokun Li
Use the EXT4_PG_TO_LBLK/EXT4_LBLK_TO_PG macros to complete the conversion between folio indexes and blocks to avoid negative left/right shifts after supporting blocksize greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-19-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_block_write_begin()Baokun Li
Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid negative left shifts after supporting blocksize greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-18-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_mpage_readpages()Baokun Li
Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid negative left shifts after supporting blocksize greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-17-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: rename 'page' references to 'folio' in multi-block allocatorZhihao Cheng
The ext4 multi-block allocator now fully supports folio objects. Update all variable names, function names, and comments to replace legacy 'page' terminology with 'folio', improving clarity and consistency. No functional changes. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-16-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: prepare buddy cache inode for BS > PS with large foliosBaokun Li
We use EXT4_BAD_INO for the buddy cache inode number. This inode is not accessed via __ext4_new_inode() or __ext4_iget(), meaning ext4_set_inode_mapping_order() is not called to set its folio order range. However, future block size greater than page size support requires this inode to support large folios, and the buddy cache code already handles BS > PS. Therefore, ext4_set_inode_mapping_order() is now explicitly called for this specific inode to set its folio order range. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-15-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_mb_init_cache()Baokun Li
Currently, ext4_mb_init_cache() uses blocks_per_page to calculate the folio index and offset. However, when blocksize is larger than PAGE_SIZE, blocks_per_page becomes zero, leading to a potential division-by-zero bug. Since we now have the folio, we know its exact size. This allows us to convert {blocks, groups}_per_page to {blocks, groups}_per_folio, thus supporting block sizes greater than page size. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-14-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_mb_get_buddy_page_lock()Baokun Li
Currently, ext4_mb_get_buddy_page_lock() uses blocks_per_page to calculate folio index and offset. However, when blocksize is larger than PAGE_SIZE, blocks_per_page becomes zero, leading to a potential division-by-zero bug. To support BS > PS, use bytes to compute folio index and offset within folio to get rid of blocks_per_page. Also, since ext4_mb_get_buddy_page_lock() already fully supports folio, rename it to ext4_mb_get_buddy_folio_lock(). Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-13-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_mb_load_buddy_gfp()Baokun Li
Currently, ext4_mb_load_buddy_gfp() uses blocks_per_page to calculate the folio index and offset. However, when blocksize is larger than PAGE_SIZE, blocks_per_page becomes zero, leading to a potential division-by-zero bug. To support BS > PS, use bytes to compute folio index and offset within folio to get rid of blocks_per_page. Also, if buddy and bitmap land in the same folio, we get that folio’s ref instead of looking it up again before updating the buddy. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-12-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversionBaokun Li
As BS > PS support is coming, all block number to page index (and vice-versa) conversions must now go via bytes. Added EXT4_LBLK_TO_PG() and EXT4_PG_TO_LBLK() macros to simplify these conversions and handle both BS <= PS and BS > PS scenarios cleanly. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-11-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversionBaokun Li
No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-10-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_readdir()Baokun Li
In ext4_readdir(), page_cache_sync_readahead() is used to readahead mapped physical blocks. With LBS support, this can lead to a negative right shift. To fix this, the page index is now calculated by first converting the physical block number (pblk) to a file position (pos) before converting it to a page index. Also, the correct number of pages to readahead is now passed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-9-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: support large block size in ext4_calculate_overhead()Baokun Li
ext4_calculate_overhead() used a single page for its bitmap buffer, which worked fine when PAGE_SIZE >= block size. However, with block size greater than page size (BS > PS) support, the bitmap can exceed a single page. To address this, we now use kvmalloc() to allocate memory of the filesystem block size, to properly support BS > PS. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-8-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: introduce s_min_folio_order for future BS > PS supportBaokun Li
This commit introduces the s_min_folio_order field to the ext4_sb_info structure. This field will store the minimum folio order required by the current filesystem, laying groundwork for future support of block sizes greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-7-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: enable DIOREAD_NOLOCK by default for BS > PS as wellBaokun Li
The dioread_nolock related processes already support large folio, so dioread_nolock is enabled by default regardless of whether the blocksize is less than, equal to, or greater than PAGE_SIZE. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-6-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: make ext4_punch_hole() support large block sizeBaokun Li
When preparing for bs > ps support, clean up unnecessary PAGE_SIZE references in ext4_punch_hole(). Previously, when a hole extended beyond i_size, we aligned the hole end upwards to PAGE_SIZE to handle partial folio invalidation. Now that truncate_inode_pages_range() already handles partial folio invalidation correctly, this alignment is no longer required. However, to save pointless tail block zeroing, we still keep rounding up to the block size here. In addition, as Honza pointed out, when the hole end equals i_size, it should also be rounded up to the block size. This patch fixes that as well. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-5-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: remove PAGE_SIZE checks for rec_len conversionBaokun Li
Previously, ext4_rec_len_(to|from)_disk only performed complex rec_len conversions when PAGE_SIZE >= 65536 to reduce complexity. However, we are soon to support file system block sizes greater than page size, which makes these conditional checks unnecessary. Thus, these checks are now removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-4-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: remove page offset calculation in ext4_block_truncate_page()Baokun Li
For bs <= ps scenarios, calculating the offset within the block is sufficient. For bs > ps, an initial page offset calculation can lead to incorrect behavior. Thus this redundant calculation has been removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-3-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysext4: remove page offset calculation in ext4_block_zero_page_range()Zhihao Cheng
For bs <= ps scenarios, calculating the offset within the block is sufficient. For bs > ps, an initial page offset calculation can lead to incorrect behavior. Thus this redundant calculation has been removed. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-2-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 daysafs: Fix uninit var in afs_alloc_anon_key()David Howells
Fix an uninitialised variable (key) in afs_alloc_anon_key() by setting it to cell->anonymous_key. Without this change, the error check may return a false failure with a bad error number. Most of the time this is unlikely to happen because the first encounter with afs_alloc_anon_key() will usually be from (auto)mount, for which all subsequent operations must wait - apart from other (auto)mounts. Once the call->anonymous_key is allocated, all further calls to afs_request_key() will skip the call to afs_alloc_anon_key() for that cell. Fixes: d27c71257825 ("afs: Fix delayed allocation of a cell's anonymous key") Reported-by: Paulo Alcantra <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara <pc@manguebit.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: syzbot+41c68824eefb67cdf00c@syzkaller.appspotmail.com cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 daysMerge tag 'vfs-6.18-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - afs: Fix delayed allocation of a cell's anonymous key The allocation of a cell's anonymous key is done in a background thread along with other cell setup such as doing a DNS upcall. The normal key lookup tries to use the key description on the anonymous authentication key as the reference for request_key() - but it may not yet be set, causing an oops - ovl: fail ovl_lock_rename_workdir() if either target is unhashed As well as checking that the parent hasn't changed after getting the lock, the code needs to check that the dentry hasn't been unhashed. Otherwise overlayfs might try to rename something that has been removed - namespace: fix a reference leak in grab_requested_mnt_ns lookup_mnt_ns() already takes a reference on mnt_ns, and so grab_requested_mnt_ns() doesn't need to take an extra reference * tag 'vfs-6.18-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix delayed allocation of a cell's anonymous key ovl: fail ovl_lock_rename_workdir() if either target is unhashed fs/namespace: fix reference leak in grab_requested_mnt_ns
11 dayserofs: enable error reporting for z_erofs_stream_switch_bufs()Gao Xiang
Enable propagation of detailed errors to callers. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
11 dayserofs: improve Zstd, LZMA and DEFLATE error stringsGao Xiang
Enable better, more detailed, and unique error reporting. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
11 dayserofs: improve decompression error reportingGao Xiang
Change the return type of decompress() from `int` to `const char *` to provide more informative error diagnostics: - A NULL return indicates successful decompression; - If IS_ERR(ptr) is true, the return value encodes a standard negative errno (e.g., -ENOMEM, -EOPNOTSUPP) identifying the specific error; - Otherwise, a non-NULL return points to a human-readable error string, and the corresponding error code should be treated as -EFSCORRUPTED. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
11 dayserofs: tidy up z_erofs_lz4_handle_overlap()Gao Xiang
- Add some useful comments to explain inplace I/Os and decompression; - Rearrange the code to get rid of one unnecessary goto. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
11 daysfile: convert replace_fd() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-44-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysexec: convert begin_new_exec() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-21-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysxfs: convert xfs_open_by_handle() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-17-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysuserfaultfd: convert new_userfaultfd() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-16-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daystimerfd: convert timerfd_create() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-15-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 dayssignalfd: convert do_signalfd4() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-14-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysopen: convert do_sys_openat2() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-13-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 dayseventpoll: convert do_epoll_create() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-12-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysautofs: convert autofs_dev_ioctl_open_mountpoint() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-11-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysnsfs: convert ns_ioctl() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-10-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysnsfs: convert open_namespace() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-9-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysfanotify: convert fanotify_init() to FD_PREPARE()Christian Brauner
Christian Brauner <brauner@kernel.org> says: The fix sent in [1] was squashed into this commit. Link: https://lore.kernel.org/20251127201618.2115275-1-kuniyu@google.com [1] Reported-by: syzbot+321168dfa622eda99689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/6928b121.a70a0220.d98e3.0110.GAE@google.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-8-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysnamespace: convert fsmount() to FD_PREPARE()Christian Brauner
Christian Brauner <brauner@kernel.org> says: A variant of the fix sent in [1] was squashed into this commit. Link: https://lore.kernel.org/20251128035149.392402-1-kartikey406@gmail.com [1] Reported-by: Deepanshu Kartikey <kartikey406@gmail.com> Reported-by: syzbot+94048264da5715c251f9@syzkaller.appspotmail.com Tested-by: syzbot+94048264da5715c251f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=94048264da5715c251f9 Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-7-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
11 daysnamespace: convert open_tree_attr() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-6-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>