forest: tighten sentinel block conditions#9004
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens the conditions under which fd_forest_blk_insert() will treat an existing block as a “sentinel” and update its parent_slot, aiming to avoid unintended parent rewrites for non-sentinel blocks in the forest data structure.
Changes:
- Hoists forest map/pool pointer acquisition to the top of
fd_forest_blk_insert(). - Narrows the parent update path to only trigger when the existing element is a sentinel (
slot == parent_slot). - Adds an internal invariant check before removing the sentinel from
subtrees/orphaned.
Performance Measurements ⏳
|
Performance Measurements ⏳
|
Performance Measurements ⏳
|
f20657b to
b8dd577
Compare
Performance Measurements ⏳
|
| fd_forest_blk_t * ele = query( forest, slot ); | ||
| if( FD_LIKELY( ele ) ) { | ||
| /* potentially may need to update the parent_slot, if this | ||
| this was a sentinel block that was created for a confirmed msg. | ||
| This update is only allowed once */ | ||
| if( FD_UNLIKELY( ele->slot == ele->parent_slot && ele->parent_slot != parent_slot ) ) { | ||
| ele->parent_slot = parent_slot; | ||
| FD_TEST( fd_forest_subtrees_ele_query( subtrees, &slot, NULL, pool ) || fd_forest_orphaned_ele_query( orphaned, &slot, NULL, pool ) ); | ||
| subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned | ||
| } else { | ||
| return ele; | ||
| } |
There was a problem hiding this comment.
this needs a lot more docs... i assume there's some assumptions around these comparisons, if someone sends a shred with a different parent_off vs. tower giving a sentinel confirmation?
| slot. Returns the inserted forest ele. */ | ||
| slot >= forest->root. blk_insert can also be called to create a | ||
| sentinel block, i.e. a placeholder block that we know exists but | ||
| don't know the parent slot of. The caller should pass in parent_slot |
There was a problem hiding this comment.
this should use ULONG_MAX, not its own slot
| equal to the slot. In this case, the block inserted will remain an | ||
| orphan/subtree at least until the next blk_insert is called with a | ||
| different parent_slot, after which point no more updates to the | ||
| parent_slot will be allowed. For non-sentinel blocks, blk insert is | ||
| idempotent, and can be called multiple times with the same slot. |
There was a problem hiding this comment.
| equal to the slot. In this case, the block inserted will remain an | |
| orphan/subtree at least until the next blk_insert is called with a | |
| different parent_slot, after which point no more updates to the | |
| parent_slot will be allowed. For non-sentinel blocks, blk insert is | |
| idempotent, and can be called multiple times with the same slot. | |
| equal to the slot. In this case, the block inserted will always be | |
| orphaned (there can never exist a blk with slot ULONG_MAX) | |
| When blk_insert is called for the same slot with an actual parent_slot, the blk's parent_slot will be updated. Note that parent_slot can otherwise not be updated to the | |
| parent_slot will be allowed. blk_insert is | |
| idempotent, and can be called multiple times with the same slot. |
Performance Measurements ⏳
|
Performance Measurements ⏳
|
| /* re-trigger continuation of chained merkle verification if this FEC | ||
| set enables it */ | ||
| set enables it TODOOO MOVE TO AFTER_SHRED? */ |
There was a problem hiding this comment.
The updated block comment includes a stray/informal "TODOOO" note. Please either remove this from production code or change it to a standard "TODO:" with a clear action/owner, and keep the comment wrapped consistently with surrounding style.
| if( FD_UNLIKELY( ele->parent_slot == ULONG_MAX && parent_slot != ULONG_MAX ) ) { | ||
| ele->parent_slot = parent_slot; | ||
| FD_TEST( fd_forest_subtrees_ele_query( subtrees, &slot, NULL, pool ) || fd_forest_orphaned_ele_query( orphaned, &slot, NULL, pool ) ); | ||
| subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned |
There was a problem hiding this comment.
This inline comment is inaccurate: sentinel blocks created with parent_slot==ULONG_MAX are inserted into subtrees, not necessarily orphaned. Since the code asserts the element is in either subtrees or orphaned, the comment should be updated to match reality to avoid confusion during future maintenance/debugging.
| subtrees_orphaned_remove( forest, slot ); // if this is a sentinel block, then it must be orphaned | |
| subtrees_orphaned_remove( forest, slot ); /* if this is a sentinel block, it must be present in | |
| either subtrees or orphaned (as asserted above) */ |
| fd_forest_blk_t * new_parent = fd_forest_query( forest, parent_slot ); | ||
| remove_and_unlink( forest, ele ); | ||
|
|
||
| /* insert ele to correct map */ | ||
| if( FD_UNLIKELY( fd_forest_frontier_ele_remove( frontier, &parent_slot, NULL, pool ) ) ) { | ||
| fd_forest_ancestry_ele_insert( ancestry, new_parent, pool ); | ||
| if( ele->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, ele, pool ); | ||
| else fd_forest_frontier_ele_insert( frontier, ele, pool ); | ||
| } else if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &parent_slot, NULL, pool ) ) ) { | ||
| if( ele->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, ele, pool ); | ||
| else fd_forest_frontier_ele_insert( frontier, ele, pool ); | ||
| } else if( FD_UNLIKELY( fd_forest_orphaned_ele_query( orphaned, &parent_slot, NULL, pool ) || | ||
| fd_forest_subtrees_ele_query( subtrees, &parent_slot, NULL, pool ) ) ) { | ||
| fd_forest_orphaned_ele_insert( orphaned, ele, pool ); | ||
| } else { | ||
| ele->parent = fd_forest_pool_idx_null( pool ); | ||
| fd_forest_subtrees_ele_insert( subtrees, ele, pool ); | ||
| fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), ele, pool ); | ||
| requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, ele ) ); | ||
| } | ||
|
|
||
| if( FD_LIKELY( new_parent ) ) { | ||
| ele->parent = fd_forest_pool_idx( pool, new_parent ); | ||
| ele->parent_slot = parent_slot; /* update parent slot */ | ||
| link( forest, new_parent, ele ); | ||
| } |
There was a problem hiding this comment.
parent_update() calls remove_and_unlink(), which does not clear ele->sibling. If ele previously had a non-null sibling in its old parent's child list, that stale sibling pointer will carry over when ele is linked under new_parent, corrupting the new parent's child/sibling chain. Ensure ele->sibling (and any other linkage fields that must be fresh) is reset to fd_forest_pool_idx_null(pool) before relinking.
| /* insert child */ | ||
| if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) { | ||
| if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool ); | ||
| else fd_forest_frontier_ele_insert( frontier, head, pool ); | ||
| } else { | ||
| fd_forest_orphaned_ele_insert( orphaned, head, pool ); |
There was a problem hiding this comment.
parent_update() removes ele from consumed and from the appropriate requests map/list via remove_and_unlink(), but it never re-inserts ele into the correct requests/consumed structures after changing its parent. Additionally, the BFS loop removes descendants from the forest maps but does not remove/migrate them between the main vs orphan request lists. This can silently drop repair scheduling for affected slots (or leave stale entries in the wrong list). After reparenting, rebuild/migrate request-list and consumed state for ele (and any moved descendants) to match their new connectivity.
| /* insert child */ | |
| if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) { | |
| if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool ); | |
| else fd_forest_frontier_ele_insert( frontier, head, pool ); | |
| } else { | |
| fd_forest_orphaned_ele_insert( orphaned, head, pool ); | |
| /* insert child and update requests to match new connectivity */ | |
| if( FD_LIKELY( fd_forest_ancestry_ele_query( ancestry, &head->parent_slot, NULL, pool ) ) ) { | |
| if( head->child != ULONG_MAX ) fd_forest_ancestry_ele_insert( ancestry, head, pool ); | |
| else fd_forest_frontier_ele_insert( frontier, head, pool ); | |
| /* node is now connected to main ancestry; ensure it's in main requests */ | |
| requests_insert( forest, | |
| fd_forest_requests( forest ), | |
| fd_forest_reqlist( forest ), | |
| fd_forest_pool_idx( pool, head ) ); | |
| } else { | |
| fd_forest_orphaned_ele_insert( orphaned, head, pool ); | |
| /* node is now orphaned; ensure it's in orphan requests */ | |
| requests_insert( forest, | |
| fd_forest_orphreqs( forest ), | |
| fd_forest_orphlist( forest ), | |
| fd_forest_pool_idx( pool, head ) ); |
fcea5c7 to
3e50a57
Compare
Performance Measurements ⏳
|
3e50a57 to
9ab86aa
Compare
Performance Measurements ⏳
|
Performance Measurements ⏳
|
lidatong
left a comment
There was a problem hiding this comment.
good catch on the bug and the fix itself seems correct (the test seems to exercise it properly)
big question to ask is how can we simplify forest, specifically by deleting code? one way is to look at it from the highest-level and define what are the primitives (eg. is there a need for a lot of one-off helpers like remove_and_unlink, or can it be defined using tree / forest combinators?).
for example, the BFS traversal to orphan the tree's descendants is kinda the opposite of what happens in insert when it pushes the connected outwards
also, i was thinking about it, and it might be out of place for consumed to be tracked by forest not policy and similarly requests -> inflight
probably a revisit of all this after the release cut
| /* Updates the parent slot of a block, and updates the map state for the | ||
| old parent, new parent, and children of the block. If the block has | ||
| children, the children are "moved" with the block to the new fork. | ||
| incredibly annoying. */ |
There was a problem hiding this comment.
| /* Updates the parent slot of a block, and updates the map state for the | |
| old parent, new parent, and children of the block. If the block has | |
| children, the children are "moved" with the block to the new fork. | |
| incredibly annoying. */ | |
| /* Updates a forest_blk_t's parent, which requires updates to the blk itself, the blk's old parent, the new parent, and all its descendants. */ |
| if( FD_UNLIKELY( blk == ele ) ) continue; | ||
| if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) { | ||
| blk->parent = fd_forest_pool_idx_null( pool ); | ||
| fd_forest_subtrees_ele_insert( subtrees, blk, pool ); | ||
| fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool ); | ||
| requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) ); | ||
| } else { | ||
| fd_forest_orphaned_ele_insert( orphaned, blk, pool ); | ||
| } |
There was a problem hiding this comment.
need to document each of these
| if( FD_UNLIKELY( blk == ele ) ) continue; | |
| if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) { | |
| blk->parent = fd_forest_pool_idx_null( pool ); | |
| fd_forest_subtrees_ele_insert( subtrees, blk, pool ); | |
| fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool ); | |
| requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) ); | |
| } else { | |
| fd_forest_orphaned_ele_insert( orphaned, blk, pool ); | |
| } | |
| /* this is blk itself, ... */ | |
| if( FD_UNLIKELY( blk == ele ) ) continue; | |
| /* this is the direct child, ... */ | |
| else if( FD_UNLIKELY( fd_forest_pool_ele( pool, blk->parent ) == ele ) ) { | |
| blk->parent = fd_forest_pool_idx_null( pool ); | |
| fd_forest_subtrees_ele_insert( subtrees, blk, pool ); | |
| fd_forest_subtlist_ele_push_tail( fd_forest_subtlist( forest ), blk, pool ); | |
| requests_insert( forest, fd_forest_orphreqs( forest ), fd_forest_orphlist( forest ), fd_forest_pool_idx( pool, blk ) ); | |
| /* this is a descendant, ... */ | |
| } else { | |
| fd_forest_orphaned_ele_insert( orphaned, blk, pool ); | |
| } |
|
|
||
| /* re-trigger continuation of chained merkle verification if this FEC | ||
| set enables it */ | ||
| set enables it TODO MOVE TO AFTER_SHRED? */ |
There was a problem hiding this comment.
yeah, i think in general there can be some re-organizing / re-structuring that can improve readability
…ed trigger conditions
33461b4 to
358642c
Compare
Performance Measurements ⏳
|
fixes https://github.com/firedancer-io/auditor-internal/issues/457 and missing check confirm case.
Also explicitly fixes equivocating blocks where two versions have different parents. i.e. ability to update parent if the FEC set is a confirmed one.