Skip to content

DAOS-18541 rebuild: accumulate more OIDs per migrate RPC to reduce RP…#17712

Draft
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18541-2.6
Draft

DAOS-18541 rebuild: accumulate more OIDs per migrate RPC to reduce RP…#17712
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18541-2.6

Conversation

@wangshilong
Copy link
Contributor

…C count

Fix yield-count accounting in the scanner, A send-side batching policy is also introduced: the send ULT defers flushing until at least REBUILD_SEND_BATCH_MIN OIDs are queued or REBUILD_SEND_BATCH_TIMEOUT_SEC seconds have elapsed.

Without batching, a fast scanner floods the destination rank with many small RPCs, exhausting IB receive buffers and triggering timeouts. This is especially severe during reintegration, where all OIDs are concentrated on a single target rank.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…C count

Fix yield-count accounting in the scanner, A send-side batching policy is also introduced:
the send ULT defers flushing until at least REBUILD_SEND_BATCH_MIN OIDs are queued or
REBUILD_SEND_BATCH_TIMEOUT_SEC seconds have elapsed.

Without batching, a fast scanner floods the destination rank with many small RPCs,
exhausting IB receive buffers and triggering timeouts. This is especially severe
during reintegration, where all OIDs are concentrated on a single target rank.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@github-actions
Copy link

Ticket title is 'Rebuild stuck on Bear cluster'
Status is 'In Progress'
Labels: 'test_2.8'
https://daosio.atlassian.net/browse/DAOS-18541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant