Merge Staging with Main for new release by oten91 · Pull Request #510 · pokt-network/path

oten91 · 2026-03-11T23:25:29Z

…ontention This commit addresses RPS concerns by reducing lock contention in the block height tracking hot path. Changes: - Change perceivedBlockNumber from uint64 to atomic.Uint64 - Remove locks from basicEndpointValidation() and isBlockNumberValid() - Use atomic Load() for reads and CompareAndSwap() for writes - Optimize filterValidEndpointsWithDetails() to copy data under lock then release lock before iterating (O(1) lock hold instead of O(n)) - UpdateFromExtractedData() now uses atomic CAS instead of mutex Before: Request path held RLock for entire endpoint filtering loop, blocking observation writes and causing cascading delays. After: Lock held only briefly to copy endpoint data, atomic reads for perceivedBlockNumber are lock-free. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@oten91

This PR combines and extends the work from #505 with additional bug fixes and improvements for hedge racing and retry reliability. Closes #505 ## Features (from #505) ### Protocol Error Propagation - Add `SetProtocolError` to `RequestQoSContext` interface for specific error messages - Replace generic "no endpoint responses received" with specific errors like "no valid endpoints available for service" ### Hedge Racing (New Feature) - Spawn parallel "hedge" request after configurable delay if primary hasn't responded - First successful response wins; the other is cancelled - Configurable via `retry_config.hedge_delay` and `retry_config.connect_timeout` - Track outcomes via `X-Hedge-Result` header ### Retry Enhancements - **Time Budget**: `max_retry_latency` skips retries when failed request already took too long - **Endpoint Rotation**: Each retry attempt uses a different endpoint - **Heuristic Detection**: Retry on JSON-RPC errors hidden in HTTP 200 responses - **Observability**: Track via `X-Retry-Count` and `X-Suppliers-Tried` headers ### Heuristic Response Analysis - Detect errors in response payloads despite HTTP 200 status - Identify: JSON-RPC errors, HTML error pages, empty responses, malformed JSON - Record correcting reputation signals for detected failures ### Response Metadata Headers | Header | Description | |----------------------|---------------------------------------------------------------------------------| | `X-Retry-Count` | Number of retry attempts (0 = first attempt succeeded) | | `X-Suppliers-Tried` | Comma-separated list of attempted supplier addresses | | `X-Hedge-Result` | Hedge racing outcome: `primary_only`, `primary_won`, `hedge_won`, `both_failed` | | `X-App-Address` | Application address used for the relay | | `X-Supplier-Address` | Supplier address of the responding endpoint | | `X-Session-ID` | Session ID for the relay | ### Health Check & Sync Check - **Sync check validation**: Health checks now validate endpoint block height against QoS perceived block number using `sync_allowance` config - Consolidated block height validation directly into health check executor (removed standalone `BlockHeightValidator`, `BlockHeightReferenceCache`) - Simplified health check config structure - Fix defer pattern in solana.go for mutex unlock - Add nil map initialization safety check in solana.go ## Bug Fixes (this PR) - **X-Suppliers-Tried header**: Pre-register both primary and hedge suppliers when racing starts - **selectTopRankedEndpoint**: Return original endpoint address instead of reputation key (fixes 'endpoint not available' errors) - **Retry blockchain errors**: Detect and retry node-specific errors (missing trie node, unhealthy node) even in valid JSON-RPC responses - **Health check refactor**: Simplify block height validation and consolidate into health checks ## Contributions from @oten91 - Prioritized endpoint inclusion during reputation filtering (mitigates race conditions) - Request-awareness for data extraction methods - Enhanced JSON-RPC response analysis with stricter error classification - Heuristic-based error classification with unit tests - Improved supplier tracking and debugging - JSON-RPC error handling to prevent retries for valid client errors ## Configuration ```yaml services: - service_id: eth retry_config: enabled: true max_retries: 2 hedge_delay: 500ms connect_timeout: 200ms max_retry_latency: 5s retry_on_5xx: true retry_on_timeout: true retry_on_connection: true ``` Includes #508 and #507 ### Testing - [x] Unit tests - [x] E2E tests (eth service 74.33% success rate) - [x] Local hedge testing verified with `scripts/test_hedge.sh` --------- Co-authored-by: Otto V <ottoevargas@gmail.com>

Remove unused functions (calculateRetryBackoff, mock getBlock/setBlock), simplify embedded field selectors flagged by staticcheck, and handle unchecked Encode error.

No suppliers stake comet_bft endpoints for xrplevm, so the example config should match production which only supports json_rpc and websocket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

oten91 and others added 2 commits January 23, 2026 12:36

oten91 requested a review from jorgecuesta March 11, 2026 23:25

oten91 and others added 4 commits March 12, 2026 00:34

chore: Update golangci-lint to v2.11 in CI workflow configuration

fb3ead1

fix: Resolve golangci-lint v2.11 errors

e2746bd

Remove unused functions (calculateRetryBackoff, mock getBlock/setBlock), simplify embedded field selectors flagged by staticcheck, and handle unchecked Encode error.

fix: Remove unsupported comet_bft from xrplevm example config

731968b

No suppliers stake comet_bft endpoints for xrplevm, so the example config should match production which only supports json_rpc and websocket. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: Bump filippo.io/edwards25519 from v1.1.0 to v1.1.1

34aed06

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jorgecuesta approved these changes Mar 12, 2026

View reviewed changes

oten91 merged commit 2646ab1 into main Mar 12, 2026
17 of 30 checks passed

oten91 deleted the staging branch March 12, 2026 11:55

oten91 restored the staging branch March 12, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Staging with Main for new release#510

Merge Staging with Main for new release#510
oten91 merged 6 commits intomainfrom
staging

oten91 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oten91 commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants