mm/shmem: optimize read with reduced xarray lookups and folio batching by vfsci-bot[bot] · Pull Request #1437 · linux-fsdevel/vfs

vfsci-bot · 2026-05-20T10:50:53Z

Series: https://patchwork.kernel.org/project/linux-fsdevel/list/?series=1097925
Submitter: Chi Zhiling
Version: 1
Patches: 5/5
Message-ID: <20260520101538.58745-1-chizhiling@163.com>
Base: vfs.base.ci
Lore: https://lore.kernel.org/linux-fsdevel/20260520101538.58745-1-chizhiling@163.com

Automated by ml2pr

When reading small amounts of data from the page cache, only a single folio is typically returned from filemap_read_get_batch(). In this case, calling xas_advance() or xas_next() after adding the folio to the batch is unnecessary and only introduces extra branches. The same issue exists for large reads, where one additional xarray walk is always performed before termination. Move the boundary check to after the folio is added to the batch so the final redundant xarray advancement can be avoided. This significantly reduces the branch count in the read path. xas_next() does not update xa_index when xas->xa_node is set to XAS_RESTART, so checking the boundary before updating xa_index is sufficient to keep the folio within range. The warning should therefore never trigger. The branch count: 654.198 M/sec -> 646.444 M/sec Performance counter stats for 'fio --ioengine=sync --rw=read --bs=4k --size=1G --runtime=300 --time_based --group_reporting --name=seq_read_test --filename=file': before: READ: bw=2697MiB/s (2828MB/s), 2697MiB/s-2697MiB/s (2828MB/s-2828MB/s), io=790GiB (848GB), run=300001-300001msec 245602051556 task-clock # 0.821 CPUs utilized 78467 context-switches # 319.488 /sec 40 cpu-migrations # 0.163 /sec 3388 page-faults # 13.795 /sec 758312319204 instructions # 0.74 insn per cycle 1025881497502 cycles # 4.177 GHz 160672383734 branches # 654.198 M/sec 361904512 branch-misses # 0.23% of all branches after: READ: bw=2709MiB/s (2841MB/s), 2709MiB/s-2709MiB/s (2841MB/s-2841MB/s), io=794GiB (852GB), run=300000-300000msec 243985503670 task-clock # 0.812 CPUs utilized 79004 context-switches # 323.806 /sec 30 cpu-migrations # 0.123 /sec 3355 page-faults # 13.751 /sec 747830935069 instructions # 0.73 insn per cycle 1019609333322 cycles # 4.179 GHz 157722976668 branches # 646.444 M/sec 348984893 branch-misses # 0.22% of all branches Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>

Apply the same optimization used in filemap_get_read_batch() by moving the boundary check from the loop condition to before xas_advance(), avoiding an unnecessary xarray lookup and reducing branches in the fast path. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>

Change SGP_NOALLOC to return 0 with NULL folio on hole, matching SGP_READ behavior. This simplifies the sgp_type handling by unifying hole semantics across these types. Previously, SGP_NOALLOC returned -ENOENT on hole, while SGP_READ returned 0. This inconsistency required special handling in callers like khugepaged and userfaultfd. After this change: - khugepaged: behavior unchanged (checks both error and NULL folio) - userfaultfd: behavior unchanged (both -ENOENT and NULL are converted to -EFAULT before returning to userspace) Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>

This is a prep patch for shmem folio batching in the read path, where non-uptodate folios need to be handled in the main iteration loop. A large non-uptodate folio should be treated as a hole. Currently, holes larger than PAGE_SIZE cannot be handled because ZERO_PAGE is limited to a single page. Add copy_zero_to_iter() as a wrapper to support copying larger zero ranges to the iterator. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>

Optimize shmem file read by using filemap_get_folios_contig() to batch fetch contiguous folios from the page cache, reducing the overhead of repeated shmem_get_folio() calls. When the folio batch is exhausted, attempt to refill it with filemap_get_folios_contig(). If no folios are found (hole or swapped out pages), fall back to shmem_get_folio() to handle these cases individually. Additionally: - Defer folio_put() until the batch is exhausted or on exit - Add folio_test_uptodate() check before copying to ensure data validity Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>

Chi Zhiling added 5 commits May 20, 2026 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mm/shmem: optimize read with reduced xarray lookups and folio batching#1437

mm/shmem: optimize read with reduced xarray lookups and folio batching#1437
vfsci-bot[bot] wants to merge 5 commits into
vfs.base.cifrom
pw/1097925/vfs.base.ci

vfsci-bot Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

vfsci-bot Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants