Skip to content

forest: tighten sentinel block conditions#9004

Merged
emwang-jump merged 1 commit intomainfrom
emwang/sentinel-blk-fix
Apr 6, 2026
Merged

forest: tighten sentinel block conditions#9004
emwang-jump merged 1 commit intomainfrom
emwang/sentinel-blk-fix

Conversation

@emwang-jump
Copy link
Copy Markdown
Contributor

@emwang-jump emwang-jump commented Mar 23, 2026

fixes https://github.com/firedancer-io/auditor-internal/issues/457 and missing check confirm case.

Also explicitly fixes equivocating blocks where two versions have different parents. i.e. ability to update parent if the FEC set is a confirmed one.

@emwang-jump emwang-jump marked this pull request as ready for review March 24, 2026 16:12
Copilot AI review requested due to automatic review settings March 24, 2026 16:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the conditions under which fd_forest_blk_insert() will treat an existing block as a “sentinel” and update its parent_slot, aiming to avoid unintended parent rewrites for non-sentinel blocks in the forest data structure.

Changes:

  • Hoists forest map/pool pointer acquisition to the top of fd_forest_blk_insert().
  • Narrows the parent update path to only trigger when the existing element is a sentinel (slot == parent_slot).
  • Adds an internal invariant check before removing the sentinel from subtrees/orphaned.

Comment thread src/discof/forest/fd_forest.c Outdated
Comment thread src/discof/forest/fd_forest.c Outdated
Comment thread src/discof/forest/fd_forest.c Outdated
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122571 s 0.122642 s 0.058%
backtest mainnet-406545575-perf snapshot load 4.902 s 3.337 s -31.926%
backtest mainnet-406545575-perf total elapsed 122.570988 s 122.642214 s 0.058%
firedancer mem usage with mainnet.toml 1095.43 GiB 1095.43 GiB 0.000%

@emwang-jump emwang-jump requested a review from lidatong as a code owner March 24, 2026 16:34
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122503 s 0.122348 s -0.127%
backtest mainnet-406545575-perf snapshot load 5.165 s 3.393 s -34.308%
backtest mainnet-406545575-perf total elapsed 122.502727 s 122.347967 s -0.126%
firedancer mem usage with mainnet.toml 1095.43 GiB 1095.43 GiB 0.000%

Copilot AI review requested due to automatic review settings March 24, 2026 17:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/discof/forest/fd_forest.h
Comment thread src/discof/forest/fd_forest.c
Comment thread src/discof/forest/fd_forest.c
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122693 s 0.12252 s -0.141%
backtest mainnet-406545575-perf snapshot load 5.089 s 3.596 s -29.338%
backtest mainnet-406545575-perf total elapsed 122.692978 s 122.519811 s -0.141%
firedancer mem usage with mainnet.toml 1095.43 GiB 1095.43 GiB 0.000%

@emwang-jump emwang-jump force-pushed the emwang/sentinel-blk-fix branch from f20657b to b8dd577 Compare March 24, 2026 20:13
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122524 s 0.122533 s 0.007%
backtest mainnet-406545575-perf snapshot load 5.276 s 3.587 s -32.013%
backtest mainnet-406545575-perf total elapsed 122.524149 s 122.533269 s 0.007%
firedancer mem usage with mainnet.toml 1095.43 GiB 1095.43 GiB 0.000%

Comment thread src/discof/forest/fd_forest.c
Comment on lines +931 to +942
fd_forest_blk_t * ele = query( forest, slot );
if( FD_LIKELY( ele ) ) {
/* potentially may need to update the parent_slot, if this
this was a sentinel block that was created for a confirmed msg.
This update is only allowed once */
if( FD_UNLIKELY( ele->slot == ele->parent_slot && ele->parent_slot != parent_slot ) ) {
ele->parent_slot = parent_slot;
FD_TEST( fd_forest_subtrees_ele_query( subtrees, &slot, NULL, pool ) || fd_forest_orphaned_ele_query( orphaned, &slot, NULL, pool ) );
subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned
} else {
return ele;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a lot more docs... i assume there's some assumptions around these comparisons, if someone sends a shred with a different parent_off vs. tower giving a sentinel confirmation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated .h

slot. Returns the inserted forest ele. */
slot >= forest->root. blk_insert can also be called to create a
sentinel block, i.e. a placeholder block that we know exists but
don't know the parent slot of. The caller should pass in parent_slot
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use ULONG_MAX, not its own slot

Comment thread src/discof/forest/fd_forest.h Outdated
Comment on lines +729 to +733
equal to the slot. In this case, the block inserted will remain an
orphan/subtree at least until the next blk_insert is called with a
different parent_slot, after which point no more updates to the
parent_slot will be allowed. For non-sentinel blocks, blk insert is
idempotent, and can be called multiple times with the same slot.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
equal to the slot. In this case, the block inserted will remain an
orphan/subtree at least until the next blk_insert is called with a
different parent_slot, after which point no more updates to the
parent_slot will be allowed. For non-sentinel blocks, blk insert is
idempotent, and can be called multiple times with the same slot.
equal to the slot. In this case, the block inserted will always be
orphaned (there can never exist a blk with slot ULONG_MAX)
When blk_insert is called for the same slot with an actual parent_slot, the blk's parent_slot will be updated. Note that parent_slot can otherwise not be updated to the
parent_slot will be allowed. blk_insert is
idempotent, and can be called multiple times with the same slot.

Copilot AI review requested due to automatic review settings March 24, 2026 23:37
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122492 s 0.122406 s -0.070%
backtest mainnet-406545575-perf snapshot load 5.086 s 3.457 s -32.029%
backtest mainnet-406545575-perf total elapsed 122.492454 s 122.405521 s -0.071%
firedancer mem usage with mainnet.toml 1095.43 GiB 1095.43 GiB 0.000%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/discof/forest/fd_forest.c
Comment thread src/discof/forest/fd_forest.h
Copilot AI review requested due to automatic review settings March 26, 2026 00:05
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.122218 s 0.122229 s 0.009%
backtest mainnet-406545575-perf snapshot load 3.249 s 2.777 s -14.528%
backtest mainnet-406545575-perf total elapsed 122.218495 s 122.229083 s 0.009%
firedancer mem usage with mainnet.toml 1096.43 GiB 1096.43 GiB 0.000%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comment thread src/discof/repair/fd_repair_tile.c Outdated
Comment on lines +836 to +837
/* re-trigger continuation of chained merkle verification if this FEC
set enables it */
set enables it TODOOO MOVE TO AFTER_SHRED? */
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated block comment includes a stray/informal "TODOOO" note. Please either remove this from production code or change it to a standard "TODO:" with a clear action/owner, and keep the comment wrapped consistently with surrounding style.

Copilot uses AI. Check for mistakes.
if( FD_UNLIKELY( ele->parent_slot == ULONG_MAX && parent_slot != ULONG_MAX ) ) {
ele->parent_slot = parent_slot;
FD_TEST( fd_forest_subtrees_ele_query( subtrees, &slot, NULL, pool ) || fd_forest_orphaned_ele_query( orphaned, &slot, NULL, pool ) );
subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inline comment is inaccurate: sentinel blocks created with parent_slot==ULONG_MAX are inserted into subtrees, not necessarily orphaned. Since the code asserts the element is in either subtrees or orphaned, the comment should be updated to match reality to avoid confusion during future maintenance/debugging.

Suggested change
subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned
subtrees_orphaned_remove( forest, slot ); /* if this is a sentinel block, it must be present in
either subtrees or orphaned (as asserted above) */

Copilot uses AI. Check for mistakes.
Comment thread src/discof/forest/fd_forest.c Outdated
Comment on lines +1074 to +1099
fd_forest_blk_t * new_parent = fd_forest_query( forest, parent_slot );
remove_and_unlink( forest, ele );

/* insert ele to correct map */
if( FD_UNLIKELY( fd_forest_frontier_ele_remove( frontier, &parent_slot, NULL, pool ) ) ) {
fd_forest_ancestry_ele_insert( ancestry, new_parent, pool );
if( ele->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, ele, pool );
else fd_forest_frontier_ele_insert( frontier, ele, pool );
} else if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &parent_slot, NULL, pool ) ) ) {
if( ele->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, ele, pool );
else fd_forest_frontier_ele_insert( frontier, ele, pool );
} else if( FD_UNLIKELY( fd_forest_orphaned_ele_query( orphaned, &parent_slot, NULL, pool ) ||
fd_forest_subtrees_ele_query( subtrees, &parent_slot, NULL, pool ) ) ) {
fd_forest_orphaned_ele_insert( orphaned, ele, pool );
} else {
ele->parent = fd_forest_pool_idx_null( pool );
fd_forest_subtrees_ele_insert( subtrees, ele, pool );
fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), ele, pool );
requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, ele ) );
}

if( FD_LIKELY( new_parent ) ) {
ele->parent = fd_forest_pool_idx( pool, new_parent );
ele->parent_slot = parent_slot; /* update parent slot */
link( forest, new_parent, ele );
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_update() calls remove_and_unlink(), which does not clear ele->sibling. If ele previously had a non-null sibling in its old parent's child list, that stale sibling pointer will carry over when ele is linked under new_parent, corrupting the new parent's child/sibling chain. Ensure ele->sibling (and any other linkage fields that must be fresh) is reset to fd_forest_pool_idx_null(pool) before relinking.

Copilot uses AI. Check for mistakes.
Comment thread src/discof/forest/fd_forest.c Outdated
Comment on lines +1116 to +1121
/* insert child */
if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) {
if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool );
else fd_forest_frontier_ele_insert( frontier, head, pool );
} else {
fd_forest_orphaned_ele_insert( orphaned, head, pool );
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_update() removes ele from consumed and from the appropriate requests map/list via remove_and_unlink(), but it never re-inserts ele into the correct requests/consumed structures after changing its parent. Additionally, the BFS loop removes descendants from the forest maps but does not remove/migrate them between the main vs orphan request lists. This can silently drop repair scheduling for affected slots (or leave stale entries in the wrong list). After reparenting, rebuild/migrate request-list and consumed state for ele (and any moved descendants) to match their new connectivity.

Suggested change
/* insert child */
if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) {
if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool );
else fd_forest_frontier_ele_insert( frontier, head, pool );
} else {
fd_forest_orphaned_ele_insert( orphaned, head, pool );
/* insert child and update requests to match new connectivity */
if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) {
if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool );
else fd_forest_frontier_ele_insert( frontier, head, pool );
/* node is now connected to main ancestry; ensure it's in main requests */
requests_insert( forest,
fd_forest_requests( forest ),
fd_forest_reqlist( forest ),
fd_forest_pool_idx( pool, head ) );
} else {
fd_forest_orphaned_ele_insert( orphaned, head, pool );
/* node is now orphaned; ensure it's in orphan requests */
requests_insert( forest,
fd_forest_orphreqs( forest ),
fd_forest_orphlist( forest ),
fd_forest_pool_idx( pool, head ) );

Copilot uses AI. Check for mistakes.
@emwang-jump emwang-jump force-pushed the emwang/sentinel-blk-fix branch from fcea5c7 to 3e50a57 Compare March 31, 2026 21:03
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.142761 s 0.143211 s 0.315%
backtest mainnet-406545575-perf snapshot load 5.126 s 3.554 s -30.667%
backtest mainnet-406545575-perf total elapsed 142.761321 s 143.210901 s 0.315%
firedancer mem usage with mainnet.toml 1106.43 GiB 1106.43 GiB 0.000%

Copilot AI review requested due to automatic review settings March 31, 2026 21:14
@emwang-jump emwang-jump force-pushed the emwang/sentinel-blk-fix branch from 3e50a57 to 9ab86aa Compare March 31, 2026 21:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/discof/forest/fd_forest.h Outdated
@github-actions
Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.12202 s 0.122038 s 0.015%
backtest mainnet-406545575-perf snapshot load 3.404 s 2.799 s -17.773%
backtest mainnet-406545575-perf total elapsed 122.019659 s 122.037546 s 0.015%
firedancer mem usage with mainnet.toml 1106.43 GiB 1106.43 GiB 0.000%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.137141 s 0.137192 s 0.037%
backtest mainnet-406545575-perf snapshot load 3.622 s 3.102 s -14.357%
backtest mainnet-406545575-perf total elapsed 137.141484 s 137.191746 s 0.037%
firedancer mem usage with mainnet.toml 1112.43 GiB 1112.43 GiB 0.000%

lidatong
lidatong previously approved these changes Apr 6, 2026
Copy link
Copy Markdown
Member

@lidatong lidatong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch on the bug and the fix itself seems correct (the test seems to exercise it properly)

big question to ask is how can we simplify forest, specifically by deleting code? one way is to look at it from the highest-level and define what are the primitives (eg. is there a need for a lot of one-off helpers like remove_and_unlink, or can it be defined using tree / forest combinators?).

for example, the BFS traversal to orphan the tree's descendants is kinda the opposite of what happens in insert when it pushes the connected outwards

also, i was thinking about it, and it might be out of place for consumed to be tracked by forest not policy and similarly requests -> inflight

probably a revisit of all this after the release cut

Comment thread src/discof/forest/fd_forest.c Outdated
Comment on lines +1065 to +1068
/* Updates the parent slot of a block, and updates the map state for the
old parent, new parent, and children of the block. If the block has
children, the children are "moved" with the block to the new fork.
incredibly annoying. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Updates the parent slot of a block, and updates the map state for the
old parent, new parent, and children of the block. If the block has
children, the children are "moved" with the block to the new fork.
incredibly annoying. */
/* Updates a forest_blk_t's parent, which requires updates to the blk itself, the blk's old parent, the new parent, and all its descendants. */

Comment on lines +1104 to +1112
if( FD_UNLIKELY( blk == ele ) ) continue;
if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) {
blk->parent = fd_forest_pool_idx_null( pool );
fd_forest_subtrees_ele_insert( subtrees, blk, pool );
fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool );
requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) );
} else {
fd_forest_orphaned_ele_insert( orphaned, blk, pool );
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to document each of these

Suggested change
if( FD_UNLIKELY( blk == ele ) ) continue;
if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) {
blk->parent = fd_forest_pool_idx_null( pool );
fd_forest_subtrees_ele_insert( subtrees, blk, pool );
fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool );
requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) );
} else {
fd_forest_orphaned_ele_insert( orphaned, blk, pool );
}
/* this is blk itself, ... */
if( FD_UNLIKELY( blk == ele ) ) continue;
/* this is the direct child, ... */
else if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) {
blk->parent = fd_forest_pool_idx_null( pool );
fd_forest_subtrees_ele_insert( subtrees, blk, pool );
fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool );
requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) );
/* this is a descendant, ... */
} else {
fd_forest_orphaned_ele_insert( orphaned, blk, pool );
}

Comment thread src/discof/forest/fd_forest.c Outdated

/* re-trigger continuation of chained merkle verification if this FEC
set enables it */
set enables it TODO MOVE TO AFTER_SHRED? */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think in general there can be some re-organizing / re-structuring that can improve readability

Copilot AI review requested due to automatic review settings April 6, 2026 15:05
@emwang-jump emwang-jump force-pushed the emwang/sentinel-blk-fix branch from 33461b4 to 358642c Compare April 6, 2026 15:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@emwang-jump emwang-jump enabled auto-merge (rebase) April 6, 2026 15:09
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.11363 s 0.1134 s -0.202%
backtest mainnet-406545575-perf snapshot load 3.073 s 1.933 s -37.097%
backtest mainnet-406545575-perf total elapsed 113.630028 s 113.400378 s -0.202%
firedancer mem usage with mainnet.toml 1112.43 GiB 1112.43 GiB 0.000%

@emwang-jump emwang-jump merged commit ab12659 into main Apr 6, 2026
21 checks passed
@emwang-jump emwang-jump deleted the emwang/sentinel-blk-fix branch April 6, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants