Conversation
…M to fix panic async-watch v0.3.1 has a race condition in its changed() implementation that panics with "[bug] failed to observe change after notificaton." under the multi-threaded tokio runtime. Replace it with tokio::sync::watch on non-WASM targets, keeping async-watch only for WASM where single-threaded execution avoids the race. Also fix a secondary bug where Err (channel closed) in HealthManagerActor::run did `continue` instead of `break`, causing the actor to spin forever starving the cancellation token arm. Fixes #698. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adamspofford-dfinity
approved these changes
Mar 6, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #698 —
[bug] failed to observe change after notificaton.panic reported against ic-agent 0.45 and still present in 0.46.0.Root cause
async-watchv0.3.1 has a race condition in itschanged()implementation: after the internal event-listener future resolves,maybe_changed()can returnNoneif the executor context switches before the version is read, causing an unconditionalexpect(...)to panic. This race only manifests on multi-threaded tokio runtimes.Fix
async-watchwithtokio::sync::watchon non-WASM targets.tokio::sync::watchis battle-tested and race-condition-free.async-watchis kept for WASM where the single-threaded environment avoids the race.continue→breakwhenfetch_receiver_recvreturnsErr(sender dropped). The oldcontinuecausedHealthManagerActor::runto spin-loop forever, starving the cancellation token arm inselect!.Changes
ic-agent/Cargo.toml: moveasync-watchto WASM-only deps; add"sync"to non-WASM tokio featurestype_aliases.rs: platform-specificSenderWatch/ReceiverWatchtype aliasesdynamic_route_provider.rs: platform-specific watch channel constructionhealth_check.rs: platform-specificfetch_receiver_recvhelper;continue→breakon closed channel; two regression testsRegression tests
Two tests added to
health_check.rs(non-WASM only):test_health_manager_no_panic_on_rapid_updates_and_shutdown— 50-iteration stress test with multi-threaded tokio, flooding the watch channel 200× per iteration while health-check actors run at 1 ms intervals. CapturesJoinHandleso any spawned-task panic surfaces as a test failure.test_health_manager_exits_when_fetch_sender_dropped— deterministic test: dropsfetch_senderwhile keeping the cancellation token alive, expects the actor to exit within 2 s. With the oldcontinuebug this test times out (confirmed by temporarily reverting the fix and running the test).