Fix flaky SDK E2E tests#1418
Conversation
Make permission handler error coverage assert deterministic replayed tool results instead of waiting for final assistant text, and ensure background-agent tests wait for the completion notification before cleanup. Normalize equivalent replay proxy notification wording across SDK suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR hardens the replaying CAPI proxy and E2E assertions to eliminate flakiness caused by background-agent completion notification timing and slightly varying notification wording, with a focus on stabilizing Windows .NET E2E runs.
Changes:
- Normalize background-agent completion notification wording (
read_agent“unread results” → “full results”) in the replaying proxy to avoid replay cache misses. - Make .NET and Python E2E tests deterministic by waiting for the background-agent completion notification event before teardown.
- Fix .NET permission-handler error coverage to assert against the replayed denied tool result rather than final assistant prose.
Show a summary per file
| File | Description |
|---|---|
| test/harness/replayingCapiProxy.ts | Normalizes semantically equivalent read_agent completion-notification wording for stable replay matching. |
| test/harness/replayingCapiProxy.test.ts | Adds regression coverage to ensure the new notification normalization is applied. |
| python/e2e/test_rpc_tasks_and_handlers_e2e.py | Waits for the background-agent completion notification event and unsubscribes the handler during cleanup. |
| dotnet/test/E2E/RpcTasksAndHandlersE2ETests.cs | Waits for the background-agent completion notification event to avoid teardown races. |
| dotnet/test/E2E/PermissionE2ETests.cs | Asserts permission-handler failure behavior via replayed tool result content for determinism. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 0
Normalize task-completion notification wording in stored replay snapshots as well as incoming requests so older snapshots using the unread-results wording continue to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Assert Python permission handler errors via the replayed denied tool result instead of final assistant prose, matching the deterministic .NET coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅This PR fixes flaky E2E tests and hardens the shared replay proxy — no cross-SDK consistency concerns. Summary of changes:
Parity check: No cross-SDK inconsistencies introduced.
|
The Windows .NET E2E job was intermittently timing out in permission-handler coverage and reporting replay cache misses from a background-agent notification race. This makes the affected assertions deterministic and hardens the shared replay matching for semantically equivalent task-completion notification wording.
Summary
read_agenttask-completion notification wording in the shared replay proxy and add proxy coverage.Validation
net8.0andnet472.npm test -- replayingCapiProxyintest/harness.python -m pytest e2e\test_rpc_tasks_and_handlers_e2e.py -k should_start_background_agent_and_report_task_details.git diff --check.