Skip to content

[codex] Accept ACP user messages during async turns#3376

Open
neubig wants to merge 34 commits into
mainfrom
codex/acp-live-message-deltas
Open

[codex] Accept ACP user messages during async turns#3376
neubig wants to merge 34 commits into
mainfrom
codex/acp-live-message-deltas

Conversation

@neubig
Copy link
Copy Markdown
Member

@neubig neubig commented May 24, 2026

Summary

  • include the condenser LLM usage-id/metrics fix from fix(sdk): assign condenser LLM usage id #3368 so condenser settings do not collide with the agent LLM in the agent server
  • stream ACP assistant text through transient StreamingDeltaEvents, including ACP providers that invoke token callbacks with plain string chunks
  • release the conversation state lock around long ACP astep() awaits so websocket/API user messages can be persisted immediately
  • wrap ACP async event callbacks with short state-lock acquires for emitted events, and send ACP session/cancel when an in-flight turn is interrupted
  • when a new user message arrives during a running ACP turn, interrupt the current prompt and restart so the new message is processed instead of staying stuck in "sending"

Regression Tests

  • test_acp_string_token_callback_publishes_delta fails on the prior implementation because ACP plain-string token callbacks raised before publishing any StreamingDeltaEvent
  • test_acp_arun_accepts_user_message_while_step_is_in_flight fails on main because send_message() waits behind the in-flight ACP prompt
  • server-level send-message tests cover interrupting and restarting ACP runs when a user message arrives during execution
  • tests/sdk/test_settings.py -k condenser covers the copied condenser LLM usage-id behavior from fix(sdk): assign condenser LLM usage id #3368

Validation

  • uv run pytest -q tests/sdk/test_settings.py -k condenser
  • uv run pytest -q tests/agent_server/test_event_streaming.py tests/agent_server/test_event_service.py::TestEventServiceSendMessage tests/sdk/agent/test_acp_agent.py::TestACPAgentAstep tests/sdk/conversation/local/test_conversation_send_message.py
  • uv run ruff check openhands-sdk/openhands/sdk/settings/model.py tests/sdk/test_settings.py openhands-agent-server/openhands/agent_server/event_service.py tests/agent_server/test_event_streaming.py

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:8b59878-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-8b59878-python \
  ghcr.io/openhands/agent-server:8b59878-python

All tags pushed for this build

ghcr.io/openhands/agent-server:8b59878-golang-amd64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-golang-amd64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-golang-amd64
ghcr.io/openhands/agent-server:8b59878-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:8b59878-golang-arm64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-golang-arm64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-golang-arm64
ghcr.io/openhands/agent-server:8b59878-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:8b59878-java-amd64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-java-amd64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-java-amd64
ghcr.io/openhands/agent-server:8b59878-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:8b59878-java-arm64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-java-arm64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-java-arm64
ghcr.io/openhands/agent-server:8b59878-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:8b59878-python-amd64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-python-amd64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-python-amd64
ghcr.io/openhands/agent-server:8b59878-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:8b59878-python-arm64
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-python-arm64
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-python-arm64
ghcr.io/openhands/agent-server:8b59878-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:8b59878-golang
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-golang
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-golang
ghcr.io/openhands/agent-server:8b59878-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:8b59878-java
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-java
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-java
ghcr.io/openhands/agent-server:8b59878-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:8b59878-python
ghcr.io/openhands/agent-server:8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf-python
ghcr.io/openhands/agent-server:codex-acp-live-message-deltas-python
ghcr.io/openhands/agent-server:8b59878-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 8b59878-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 8b59878-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   event_service.py53911179%101–102, 132, 135–136, 140–141, 148, 152, 158, 168–172, 175–178, 237, 258–259, 330, 381, 391, 415–416, 420, 428, 431, 441, 473, 484, 491, 497, 551, 553, 557–559, 563, 572–573, 575, 579, 585, 587, 634, 664, 667, 722, 743, 848, 943–946, 950, 959, 961–962, 966, 980–985, 987, 1012, 1017–1020, 1024–1027, 1035–1038, 1048–1051, 1097–1098, 1100–1107, 1109–1110, 1119–1120, 1122–1123, 1130–1131, 1133–1134, 1154, 1160, 1166, 1175–1176
openhands-sdk/openhands/sdk/agent
   acp_agent.py84510088%358–360, 490–491, 524, 526, 530, 534, 542, 605–606, 611, 678, 833, 836–837, 854–855, 884, 889, 907, 917, 950–953, 1157–1160, 1164–1166, 1169–1173, 1175, 1329, 1343–1346, 1354, 1358, 1362–1363, 1369–1370, 1382–1383, 1386, 1420, 1424–1426, 1430–1431, 1463, 1682–1684, 1687–1688, 1861, 1865, 1873–1875, 1913–1914, 1917, 1925–1927, 1929, 1931, 1935, 1938, 1947–1949, 1951, 1987–1988, 2006–2009, 2012, 2016–2018, 2020, 2024–2025, 2124–2125
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py6375591%88, 328, 333, 477, 523, 560, 576, 641, 790–791, 868–869, 872, 977, 980–981, 1005, 1038–1039, 1042, 1048, 1109, 1112, 1116–1117, 1121–1122, 1125, 1132, 1157, 1161, 1164, 1183, 1235, 1238, 1277, 1285, 1289–1291, 1298, 1406, 1411, 1521, 1523, 1527–1528, 1539–1540, 1565, 1760, 1764, 1834, 1841–1842
TOTAL28715658377% 

@neubig neubig marked this pull request as ready for review May 24, 2026 12:25
@neubig neubig requested a review from simonrosenberg May 24, 2026 12:25
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean approach to the live-streaming problem: reusing the existing StreamingDeltaEvent / _pub_sub path for ACP deltas is exactly right — transient chunks don't belong in state.events, and the new helper _publish_streaming_delta_from_thread correctly encapsulates the cross-thread scheduling so neither the LLM token path nor the ACP path has to repeat the run_coroutine_threadsafe / suppress boilerplate. Test coverage is solid (wiring, unwiring, and actual delta delivery are all exercised). Two items worth a look before merge are called out below.

This review was generated by an AI agent (OpenHands) on behalf of the repository owner via OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
@neubig neubig changed the title [codex] Stream ACP assistant message deltas [codex] Accept ACP user messages during async turns May 24, 2026
@neubig neubig force-pushed the codex/acp-live-message-deltas branch from dc3d0a3 to ca04e8c Compare May 24, 2026 13:51
Copy link
Copy Markdown
Member Author

neubig commented May 24, 2026

Merged main into this branch and resolved the conflict in EventService.send_message() by preserving the ACP interrupt behavior while keeping the newer rerun-request handling from main. Local verification: env -u GITHUB_TOKEN uv run pytest tests/agent_server/test_event_service.py tests/agent_server/test_event_streaming.py tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/local/test_conversation_send_message.py -q → 289 passed.\n\n_This comment was generated by an AI agent (OpenHands) on behalf of neubig._

@neubig neubig added the review-this This label triggers a PR review by OpenHands label May 24, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable, but I found two ACP concurrency races that should be addressed before merge: one can replay a just-arrived user prompt, and one can let cancelled prompt updates leak into the next turn. Risk: 🟡 medium, since this changes conversation loop/cancellation behavior.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26368622109

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 24, 2026

Addressed the two ACP async concurrency review items in 6c89026 and verified locally with ruff plus the full EventService test file (83 passed). Ready for another automated review.\n\n_This comment was generated by an AI agent (OpenHands) on behalf of the user._

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig removed the review-this This label triggers a PR review by OpenHands label May 24, 2026
@neubig neubig added the review-this This label triggers a PR review by OpenHands label May 24, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable direction, but I found a few ACP concurrency ordering gaps that should be addressed before merge. Risk: 🟡 medium because this changes async conversation loop/cancellation behavior.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26369467316

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 24, 2026

Addressed the latest ACP concurrency review in c9bbb68: atomic async finalization, finish-gap queued-message reconciliation, and cancellation drain-before-failure ordering. Verified locally with ruff, pyright, and targeted ACP async tests.\n\n_This comment was generated by an AI agent (OpenHands) on behalf of the user._

@neubig neubig removed the review-this This label triggers a PR review by OpenHands label May 24, 2026
@neubig neubig added the review-this This label triggers a PR review by OpenHands label May 24, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable direction, but I found a few remaining ACP cancellation/restart races inline. Risk: 🟡 MEDIUM because this changes ACP conversation-loop behavior; a human maintainer should decide eval coverage before approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26427891200

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 26, 2026

Addressed the latest ACP cancellation/restart review in 91d11a8: late explicit Stop/Pause now wins during final ACP restart status checks, cancellation cleanup covers both session/cancel send and cancelled-prompt drain, completed cancelled ACP prompts commit their cursor before pause, and the requested threading assumption comment was added. The repeated prompt-scan performance suggestion was resolved as intentionally kept simple for correctness across separate lock boundaries. Local validation: pre-commit on changed files and targeted ACP/EventService suites passed.

This PR comment was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig requested a review from all-hands-bot May 26, 2026 02:27
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the PR’s ACP async-turn behavior with real SDK/EventService execution: base branch blocked live user input and did not wire ACP plain-string streaming, while the PR accepted/restarted the intervening message and emitted a streaming delta.

Does this PR achieve its stated goal?

Yes. The PR set out to accept ACP user messages during async turns, interrupt/restart an in-flight ACP prompt when a new user message arrives through the agent-server path, and stream ACP assistant text chunks. I exercised those flows with a controlled in-process ACPAgent that holds a prompt open; on base, send_message did not return while the ACP step was in flight and ACP streaming callbacks were not wired, while on the PR the same interactions completed immediately, restarted on the intervening prompt, and produced a StreamingDeltaEvent for a plain string chunk.

Phase Result
Environment Setup make build completed successfully with uv workspace dependencies installed.
CI Status 🟡 GitHub checks currently show 19 successful, 0 failing, 9 pending, 4 skipped.
Functional Verification ✅ Before/after probe confirms ACP live-message acceptance, EventService restart, and plain-string streaming behavior.
Functional Verification

Test 1: SDK LocalConversation.arun() accepts a user message while ACP astep() is in flight

Step 1 — Reproduce / establish baseline without the fix:
Ran git checkout --detach origin/main && uv run python /tmp/qa_acp_live_messages.py.
Relevant output:

{
  "local_conversation": {
    "send_completed_while_step_inflight": false,
    "send_elapsed_seconds": 0.501,
    "prompts_seen": ["initial request", "intervening request"]
  }
}

This confirms the old behavior: the user message did not persist within the 0.5s window while the ACP turn was awaiting, matching the “stuck sending” class of bug.

Step 2 — Apply the PR's changes:
Checked out PR head 91d11a86bcee4ecb2f550fecaa3ef8b74df7e0aa on codex/acp-live-message-deltas.

Step 3 — Re-run with the fix in place:
Ran git checkout codex/acp-live-message-deltas && uv run python /tmp/qa_acp_live_messages.py.
Relevant output:

{
  "local_conversation": {
    "send_completed_while_step_inflight": true,
    "send_elapsed_seconds": 0.003,
    "prompts_seen": ["initial request", "intervening request"],
    "final_status": "ConversationExecutionStatus.FINISHED"
  }
}

This shows the PR behavior works: the intervening user message was accepted immediately while the ACP step was still in flight, then the run continued to process that prompt.

Test 2: Agent-server EventService.send_message(run=True) interrupts and restarts an in-flight ACP prompt

Step 1 — Reproduce / establish baseline without the fix:
Same base command as above.
Relevant output:

{
  "event_service": {
    "send_returned_prompt_inflight": false,
    "send_elapsed_seconds": 1.001,
    "first_prompt_cancelled": true,
    "second_prompt_processed": true
  }
}

The send did not return during the in-flight prompt. The cancellation shown here happened only during probe cleanup after timeout, not as an automatic supersede/restart.

Step 2 — Apply the PR's changes:
Checked out PR head 91d11a86bcee4ecb2f550fecaa3ef8b74df7e0aa.

Step 3 — Re-run with the fix in place:
Same PR command as above.
Relevant output:

{
  "event_service": {
    "send_returned_prompt_inflight": true,
    "send_elapsed_seconds": 0.004,
    "first_prompt_cancelled": true,
    "second_prompt_processed": true,
    "prompts_seen": ["initial request", "intervening request"]
  }
}

This verifies the user-facing server behavior: submitting a new message during a running ACP turn returns quickly, cancels the old prompt, and starts processing the new user message.

Test 3: ACP plain-string token callbacks publish transient streaming deltas

Step 1 — Reproduce / establish baseline without the fix:
Same base command as above.
Relevant output:

{
  "streaming": {
    "token_callback_count": 0,
    "delta_count_after_plain_string_chunk": 0,
    "delta_contents": []
  }
}

This shows base did not wire a token callback for ACP agents without an SDK LLM stream flag, so a plain string ACP chunk could not reach subscribers.

Step 2 — Apply the PR's changes:
Checked out PR head 91d11a86bcee4ecb2f550fecaa3ef8b74df7e0aa.

Step 3 — Re-run with the fix in place:
Same PR command as above.
Relevant output:

{
  "streaming": {
    "token_callback_count": 1,
    "delta_count_after_plain_string_chunk": 1,
    "delta_contents": ["ACP live text"]
  }
}

This confirms an ACP plain-string token callback now publishes exactly one StreamingDeltaEvent containing the expected text.

Unable to Verify

I did not connect to a third-party ACP provider such as Claude Code or Gemini CLI because provider credentials/runtime setup were not available in this QA environment. Instead, I used a controlled in-process ACPAgent subclass to simulate a long-running ACP provider turn while exercising the real LocalConversation and EventService message/run paths.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [codex] Accept ACP user messages during async turns

After multiple rounds of iterative hardening (shielded drain, FIFO cursor tracking, explicit-interrupt generation counter, completed-cancelled-prompt commit), the implementation is in good shape. Test coverage for the edge cases is thorough — the 16+ new test cases systematically cover FIFO ordering, double-cancel, drain-timeout restart, stop-hook-as-prompt, iteration-cap handoff, and explicit-interrupt-wins scenarios.

Two remaining observations below; both are minor and neither blocks merge.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable direction, but I found a few remaining ACP cancellation/restart races inline. Risk: 🟡 MEDIUM because this changes ACP conversation-loop cancellation behavior; a human maintainer should decide eval coverage before approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26428776165

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py Outdated
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 26, 2026

Addressed the follow-up ACP cleanup review in eb7bed3: internal ACP reruns now re-check explicit-interrupt generation inside run() before creating a replacement task, cleanup interruption consumes/finalizes completed prompt futures or forces ACP restart, timeout cleanup has the same cancellation guard, and the redundant import is removed. Kept the drain-timeout executor suggestion as a follow-up tradeoff because it is rare/bounded and would add cross-thread complexity without changing correctness. Local validation: pre-commit on changed files and targeted ACP/EventService suites passed.

This PR comment was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig requested a review from all-hands-bot May 26, 2026 02:55
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACP live-delta streaming and lock-free async turn plumbing look solid after the extensive previous review rounds. All 50 prior threads are resolved. Two items are worth a look before merge; both are low-risk.

This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The ACP async-turn behavior works end-to-end with a real local ACP subprocess: streaming deltas are delivered, a mid-turn user message is accepted immediately, the in-flight ACP session is cancelled, and the replacement prompt completes.

Does this PR achieve its stated goal?

Yes. I exercised the changed ACP path through EventService + ACPAgent using a local JSON-RPC ACP provider process. On current main, ACP streamed text did not produce a StreamingDeltaEvent, and a second user message sent during a long ACP prompt timed out behind the running turn; on this PR, the stream delta appeared, the second send_message(run=True) returned in 0.51s, the provider observed session/cancel, and the conversation finished after processing the intervening prompt.

Phase Result
Environment Setup uv run python created/synced the project env and imported openhands.sdk + openhands.agent_server successfully.
CI Status 🟡 No failures observed; core tests/pre-commit/API checks are green, while several build/publish and automation checks were still pending/in-progress when checked.
Functional Verification ✅ ACP streaming, mid-turn user message acceptance/restart, cancel notification, and condenser usage-id behavior were exercised with real code paths.
Functional Verification

Test 1: ACP streamed assistant text becomes transient StreamingDeltaEvent

Step 1 — Reproduce / establish baseline on current origin/main (2316c4800e1148a55bd76c8ffe3c5e6976c3eac2):
Ran OPENHANDS_SUPPRESS_BANNER=1 timeout 50 uv run python /tmp/qa_acp_behavior.py against a local ACP provider that sends session/update agent_message_chunk text:

Sending ACP prompt (timeout=30s, blocks=1, async)
ACP prompt returned in 2.1s (async)
TimeoutError: condition not met

This confirms the streaming bug: the provider sent an ACP text chunk and the prompt completed, but the EventService subscriber never received a StreamingDeltaEvent within 5s.

Step 2 — Apply the PR's changes:
Checked out codex/acp-live-message-deltas at eb7bed3586132957e219c2583f7499fb27f59f4b.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 timeout 50 uv run python /tmp/qa_acp_behavior.py all:

STREAMING_DELTAS: ['streamed reply to stream please']
STREAMING_PROMPTS: ['stream please']

This shows ACP plain-text chunks now reach subscribers as transient streaming deltas.

Test 2: User message during long ACP turn is accepted and processed

Step 1 — Reproduce / establish baseline on current origin/main:
Ran OPENHANDS_SUPPRESS_BANNER=1 timeout 50 uv run python /tmp/qa_acp_behavior.py live with the fake ACP provider holding the first prompt open:

LIVE_MESSAGE_OUTCOME: send_message_timed_out_after_1s
LIVE_MESSAGE_PROMPTS: ['initial long request']
LIVE_MESSAGE_CANCELS: 0
LIVE_MESSAGE_STATUS: paused

This confirms the stuck behavior: the second user message could not be persisted promptly while the ACP turn was in flight, no ACP cancel notification was sent, and only the initial prompt reached the provider.

Step 2 — Apply the PR's changes:
Checked out codex/acp-live-message-deltas at eb7bed3586132957e219c2583f7499fb27f59f4b.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 timeout 50 uv run python /tmp/qa_acp_behavior.py all:

LIVE_MESSAGE_OUTCOME: send_message_returned_in_0.51s
LIVE_MESSAGE_PROMPTS: ['initial long request', 'intervening correction']
LIVE_MESSAGE_CANCELS: 1
LIVE_MESSAGE_STATUS: finished

This shows the new user message was accepted promptly, the in-flight ACP prompt was cancelled via session/cancel, the replacement prompt was sent to the ACP provider, and the conversation completed.

Test 3: Condenser LLM usage-id remains isolated

Step 1 — Establish current-main baseline:
Ran a short SDK settings script constructing LLMAgentSettings(llm=LLM(..., usage_id='agent-qa'), condenser=CondenserSettings(enabled=True, max_size=20)) and calling create_agent_from_settings(...):

agent_llm_usage_ids [('openai/agent-model', 'agent-qa'), ('openai/agent-model', 'condenser')]
condenser_llm_usage_id condenser
same_llm_object False

Current main already has the copied #3368 behavior.

Step 2 — Re-run on the PR branch:
Ran the same script on eb7bed3586132957e219c2583f7499fb27f59f4b:

agent_llm_usage_ids [('openai/agent-model', 'agent-qa'), ('openai/agent-model', 'condenser')]
condenser_llm_usage_id condenser
same_llm_object False

This confirms the PR preserves the intended separate condenser LLM usage ID and object rather than colliding with the agent LLM.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable direction, but I found a few remaining ACP cancellation/restart races that should be addressed before merge. Risk: 🟡 MEDIUM because this changes agent/conversation loop cancellation behavior; please have a human maintainer decide eval coverage before approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26429593103

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/event_service.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 26, 2026

Addressed the latest ACP restart-ordering review in 80dc346: generation snapshots are captured before message append and rechecked inside run(), close() clears/bounds ACP reruns, ACP arun rechecks paused/stuck before astep and repairs persisted in-flight cursor gaps on resume, cancelled ACP PromptResponse values are treated as aborted turns, stale bridge callbacks are unwired before terminal tool failures, and the cross-thread Task.done() read was removed. Local validation: pre-commit on changed files and targeted ACP/EventService suites passed.

This PR comment was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig requested a review from all-hands-bot May 26, 2026 03:29
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR solves a real and difficult problem — allowing ACP conversations to accept user messages mid-turn without blocking the sender — and the implementation reflects careful thought about the concurrency model. After multiple iterative rounds, the critical races (drain ordering, double-cancel via wait_for, deferred restart for wedged sessions, FIFO cursor tracking, suffix-install clearing on forced restart) have all been addressed. The test coverage is substantive, with meaningful regression tests for each path.

Two minor items from prior review rounds remain unresolved (inline comments below), plus one fresh semantic observation. Overall the shape of the solution is sound and the PR is in good shape for merge.

This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

Comment thread openhands-agent-server/openhands/agent_server/event_service.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The ACP live-message path now accepts a second user message during an in-flight async ACP turn, cancels/restarts the active ACP prompt, and streams ACP text chunks as transient deltas.

Does this PR achieve its stated goal?

Yes. I exercised the real EventServiceLocalConversation.arun()ACPAgent.astep() path with a stdio ACP subprocess. On current main, the second send_message(run=True) stayed blocked behind the first slow ACP prompt, no ACP cancel was sent, and no StreamingDeltaEvents were published. On the PR branch, the second message was persisted within 200ms, session/cancel was sent to the fake ACP server, the replacement prompt was processed, the conversation finished, and ACP text chunks arrived as StreamingDeltaEvents.

Phase Result
Environment Setup uv sync --dev completed; branch rebuilt through uv run after checkout.
CI Status ⏳ PR is open/blocked with 26 successes, 18 skipped checks, and 5 in-progress checks at inspection time (including this QA workflow and Docker manifest jobs).
Functional Verification ✅ Real SDK/server-facing ACP execution path verified before/after; no tests or linters were run.
Functional Verification

Test 1: ACP user message while an async turn is in-flight + ACP streaming deltas

Step 1 — Establish baseline on origin/main (2316c480) without this PR:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_live_message.py using a temporary ACP subprocess that streams a text chunk, keeps the first prompt slow, and records received prompts/cancels:

{
  "cancel_sent": false,
  "fake_acp_log": "...PROMPT 1: first slow request
STREAMED 1: first slow request
PROMPT_TIMEOUT_FINISH 1: first slow request
PROMPT 2: second fast request
STREAMED 2: second fast request...",
  "final_status": "ConversationExecutionStatus.FINISHED",
  "second_send_completed_within_6s": false,
  "second_send_elapsed_sec": 6.202,
  "stream_deltas": [],
  "user_messages_after_200ms": ["first slow request"]
}

This confirms the baseline problem: the second message was not persisted within 200ms, the ACP server did not receive session/cancel, the second send_message(run=True) was still not complete after 6s, and streamed ACP text did not reach subscribers as StreamingDeltaEvents.

Step 2 — Apply the PR's changes:
Checked out codex/acp-live-message-deltas at 80dc346e32f0fb9a2676a5d105dcf2e944314fa2; uv run rebuilt the editable workspace packages.

Step 3 — Re-run with the fix in place:
Ran the same command, same fake ACP subprocess, and same user flow:

{
  "cancel_sent": true,
  "fake_acp_log": "...PROMPT 1: first slow request
STREAMED 1: first slow request
CANCEL qa-2624ba33
PROMPT_CANCELLED 1: first slow request
...PROMPT 1: second fast request
STREAMED 1: second fast request
PROMPT_FAST_FINISH 1: second fast request...",
  "final_status": "ConversationExecutionStatus.FINISHED",
  "second_send_completed_within_6s": true,
  "second_send_elapsed_sec": 2.294,
  "stream_deltas": [
    {"content": "stream:first slow request", "reasoning_content": null},
    {"content": "stream:second fast request", "reasoning_content": null}
  ],
  "user_messages_after_200ms": ["first slow request", "second fast request"]
}

This shows the PR fixes the user-visible behavior: the new message is persisted promptly while the ACP turn is still active, the active ACP prompt is cancelled/restarted, the replacement message is processed, and ACP text chunks stream through transient delta events.

Test 2: Condenser usage-id settings path

The PR description mentions carrying the condenser LLM usage-id fix from #3368. The current merge-base/current main already contains that fix, so there was no broken baseline to reproduce here. I still exercised the settings entry point on the PR branch with OpenHandsAgentSettings(..., condenser=CondenserSettings(enabled=True, max_size=40)) and observed distinct usage IDs:

{'agent_usage_id': 'default', 'condenser_usage_id': 'condenser', 'same_llm_object': False}

This confirms the PR branch preserves the expected non-colliding condenser/agent LLM attribution behavior.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 26, 2026

Addressed the latest review nits in 24ee53f: clarified the teardown-only RuntimeError suppression, documented the ACP prompt scan tradeoff, and made cancelled prompt futures report drained=False so restart protection remains enabled. Local validation: pre-commit on changed files and targeted ACP/EventService suites passed.

This PR comment was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig requested a review from all-hands-bot May 26, 2026 03:55
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

ACP live-message and streaming behavior worked in functional before/after probes; no QA issues found.

Does this PR achieve its stated goal?

Yes. The PR set out to let ACP conversations accept user messages during async turns, stream ACP assistant text as transient deltas, and send session/cancel on interrupted ACP prompts. I reproduced the old behavior on origin/main where send_message(run=True) stayed blocked behind an in-flight ACP turn and ACP string chunks produced no streaming deltas, then verified the PR branch processes the intervening message immediately, emits a StreamingDeltaEvent, and calls ACP cancel during interruption.

Phase Result
Environment Setup uv run created/used the project environment and affected packages imported successfully.
CI Status ⏳ Latest check snapshot: 20 successful, 0 failing, 8 pending, 4 skipped.
Functional Verification ✅ Before/after SDK and agent-server probes confirm the claimed ACP behavior.
Functional Verification

Test 1: ACP plain string token chunks stream as transient deltas

Step 1 — Reproduce baseline without the fix:
Ran git switch --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_streaming.py:

{"streaming_delta_content": null, "streaming_delta_count": 0, "streaming_delta_reasoning": null, "token_callback_count": 0}

This confirms the old behavior: an ACP agent did not get a token callback wired, so a provider-style plain string chunk could not publish a StreamingDeltaEvent.

Step 2 — Apply the PR changes:
Checked out 24ee53f6a4dbd0404141d3925e408db0faf7a86e.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_streaming.py:

{"streaming_delta_content": "ACP live text", "streaming_delta_count": 1, "streaming_delta_reasoning": null, "token_callback_count": 1}

This shows the PR wires ACP token streaming and forwards a plain string chunk as one transient streaming delta.

Test 2: New user message during an in-flight ACP turn is accepted and processed

Step 1 — Reproduce baseline without the fix:
Ran git switch --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_live_message.py:

{"final_status": "ConversationExecutionStatus.PAUSED", "first_step_cancelled": true, "prompts_seen": ["initial request"], "second_prompt_processed": false, "send_completed_within_750ms": false, "send_elapsed_seconds": 0.75, "send_error": "timeout waiting for send_message(run=True)"}

This reproduces the reported user-facing problem: while the ACP async turn is in-flight, the follow-up user message cannot complete promptly and never reaches a replacement ACP prompt.

Step 2 — Apply the PR changes:
Checked out 24ee53f6a4dbd0404141d3925e408db0faf7a86e.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_live_message.py:

{"final_status": "ConversationExecutionStatus.FINISHED", "first_step_cancelled": true, "prompts_seen": ["initial request", "intervening"], "second_prompt_processed": true, "send_completed_within_750ms": true, "send_elapsed_seconds": 0.004, "send_error": null}

This confirms the changed behavior: the first ACP turn is interrupted, the new message persists immediately, and the replacement run processes intervening instead of leaving it stuck in sending.

Test 3: Interrupted ACP turn sends session/cancel

Step 1 — Reproduce baseline without the fix:
Ran git switch --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_cancel.py:

{"cancel_called": false, "cancellation_propagated": true}

This shows the old ACP cancellation path did not call the provider's cancel method.

Step 2 — Apply the PR changes:
Checked out 24ee53f6a4dbd0404141d3925e408db0faf7a86e.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_cancel.py:

{"cancel_called": true, "cancellation_propagated": false}

This confirms the PR sends ACP session/cancel when an in-flight async ACP prompt is interrupted.

Additional settings check

On the PR branch, I created an agent from LLMAgentSettings(llm=LLM(..., usage_id="agent_llm")) and observed distinct LLM usage IDs:

[("gpt-4o-mini", "agent_llm"), ("gpt-4o-mini", "condenser")]

That confirms the condenser metrics identity is not colliding with the agent LLM in this flow.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant, well-structured change. The core design decisions are sound: releasing the state lock around long ACP awaits, using _on_event_with_state_lock to serialize individual event emissions, shielding the portal future so cancel/drain can run after wait_for fires, the generation-based interrupt guard in EventService, and the FIFO message-cursor tracking via ACP_LAST_PROMPT_USER_MESSAGE_ID. Test coverage for the happy paths and several edge cases is solid.

Two items are called out below: one is a blocking-event-loop risk in the drain-timeout recovery path, and one is a minor code-duplication nit.

This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Comment thread openhands-sdk/openhands/sdk/agent/acp_agent.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable direction, but I found two remaining Stop/Pause races in the ACP restart path. Risk: 🟡 MEDIUM because this changes agent/conversation loop behavior; a human maintainer should decide eval coverage before merge.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26431325528

Comment thread openhands-agent-server/openhands/agent_server/event_service.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

neubig commented May 26, 2026

Addressed the latest Stop/Pause restart races in 8b59878: send_message(run=True) now suppresses automatic restarts if an explicit interrupt lands during the append and passes the expected generation to every auto-run start; ACP arun now preserves PauseEvent requests made during the unlocked ACP step await after committing the completed prompt cursor. I replied to the two non-blocking suggestions and left them as-is to avoid adding complexity outside the cancellation-ordering fix. Local validation: pre-commit on changed files and targeted EventService/conversation tests passed.

This PR comment was generated by an AI agent (OpenHands) on behalf of the user.

@neubig neubig requested a review from all-hands-bot May 26, 2026 04:31
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the PR’s primary ACP async-turn behavior end-to-end with a real EventService instance and a subprocess ACP provider: the new message is persisted promptly, the in-flight ACP turn is cancelled/restarted, ACP string chunks stream as deltas, and the replacement prompt finishes.

Does this PR achieve its stated goal?

Yes. The PR set out to make ACP conversations accept user messages during long async turns, stream ACP assistant text through transient deltas, and cancel/restart an in-flight ACP prompt so the new message is processed instead of staying stuck in sending. On main, the same user flow blocked for >4s, emitted no streaming deltas, did not send session/cancel, and never sent the second message to the ACP provider; on this PR, the second user message was visible in conversation state after 0.2s, send_message(run=True) completed in 2.274s, the fake ACP provider received cancel, the second prompt was sent, streaming deltas arrived for both prompts, and the conversation reached FINISHED.

Phase Result
Environment Setup uv run rebuilt/imported local packages successfully; no tests/linters were run.
CI Status 🟡 Snapshot: 20 successful, 8 pending, 4 skipped, 0 failing checks.
Functional Verification ✅ Before/after harness reproduced the stuck baseline and verified ACP live-message + delta behavior on the PR.
Functional Verification

Test 1: ACP user message during an in-flight async turn

I used temporary QA harnesses outside the repository (/tmp/qa_fake_acp_agent.py and /tmp/qa_acp_event_service.py). The fake ACP agent runs as a real subprocess over ACP stdio, emits AgentMessageChunk text, blocks the first prompt until session/cancel, then finishes when it receives second task. The client side uses the real EventService, LocalWorkspace, ACPAgent, subscription path, send_message(run=True), and conversation state.

Step 1 — Establish baseline without this PR:
Ran git checkout --detach origin/main && uv run python /tmp/qa_acp_event_service.py:

{
  "cancel_count": 0,
  "commit": "0c3c3e45",
  "final_status_before_close": "ConversationExecutionStatus.PAUSED",
  "first_delta_seen": false,
  "prompt2_seen": false,
  "prompt_starts": [
    {"count": 1, "text": "first task"}
  ],
  "second_message_visible_after_0_2s": false,
  "second_send_completed_within_4s": false,
  "second_send_elapsed_seconds": 4.205,
  "streaming_deltas": []
}

This confirms the baseline problem: while the first ACP prompt is in flight, the second user message is not persisted quickly, send_message(run=True) remains blocked past 4 seconds, no ACP session/cancel is sent, no replacement prompt reaches the provider, and ACP string chunks do not stream as StreamingDeltaEvents.

Step 2 — Apply the PR's changes:
Checked out codex/acp-live-message-deltas at 8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf and reran the same harness.

Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_acp_event_service.py:

{
  "cancel_count": 1,
  "commit": "8b59878f",
  "final_status_before_close": "ConversationExecutionStatus.FINISHED",
  "first_delta_seen": true,
  "prompt2_seen": true,
  "prompt_starts": [
    {"count": 1, "text": "first task"},
    {"count": 1, "text": "second task"}
  ],
  "second_message_visible_after_0_2s": true,
  "second_send_completed_within_4s": true,
  "second_send_elapsed_seconds": 2.274,
  "streaming_deltas": [
    "delta:1:first task",
    "delta:1:second task"
  ]
}

This verifies the changed behavior works through the real server/conversation path: the second message is persisted promptly while the ACP turn is still running, the in-flight prompt is interrupted via ACP cancel/restart, the replacement prompt is processed, ACP plain-string chunks are published as transient streaming deltas, and the conversation finishes.

Test 2: Settings-based condenser LLM usage ID

Ran a small SDK script through the public settings API on the PR branch:

uv run python -c "from openhands.sdk.settings.model import default_agent_settings, create_agent_from_settings; agent=create_agent_from_settings(default_agent_settings()); [print(i, llm.model, getattr(llm, 'usage_id', None)) for i, llm in enumerate(list(agent.get_all_llms()))]"

Observed:

0 gpt-5.5 default
1 gpt-5.5 condenser

This confirms settings-created agents expose a distinct condenser LLM usage ID instead of colliding with the default agent LLM usage bucket.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Solid implementation across a genuinely tricky surface area. The three-part design — releasing the state lock around long ACP awaits, streaming StreamingDeltaEvent from plain-string token callbacks, and the FIFO message cursor with supersede markers — holds together well. The iterative hardening through this review cycle has paid off: session/cancel now actually fires before re-wiring callbacks, close() now bumps _explicit_interrupt_generation, and the cancelled-prompt stop_reason == "cancelled" check is in place.

On the latest unresolved bot comments (posted ~03:22 UTC today)

Inspecting the current head (8b59878) against the six open bot threads:

  • close() not bumping generation (line 931 of event_service.py): Fixed — close() now sets _closing = True, increments _explicit_interrupt_generation, and clears both rerun flags before the run-task drain.
  • Drained prompt finalized without stop_reason check (acp_agent.py line 1985): Fixed — _prompt_response_was_cancelled() is now called before _finalize_successful_turn in all drain paths.
  • Callbacks wired while _cancel_inflight_tool_calls emits (acp_agent.py line 1998): Fixed — _cancel_inflight_tool_calls now captures on_event, calls _clear_turn_callbacks() first, then emits synthetic failures through the captured reference.
  • Generation snapshot race (event_service.py send_message): Mitigated — snapshot is taken before the first await; every subsequent async boundary (after _mark_running_acp_prompt_superseded, after interrupt, inside run()) re-checks the generation before acting.
  • Concurrent pause() between lock-release and astep() start (local_conversation.py line 1178): Acceptable — LocalConversation.pause() emits a PauseEvent under the state lock while status is RUNNING, so pause_requested_during_acp_step catches it after astep() completes. The ACP turn runs to natural completion before honoring the pause, which is the correct semantics for a remote subprocess.
  • Cursor committed after astep() persists (local_conversation.py line 1223): Both the FINISHED status write and cursor commit happen in the same with self._state: block atomically from a concurrency perspective; crash-durability is out of scope here.

@neubig — please confirm these are intentionally addressed so those threads can be resolved.


This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

# the latest tail; the list is usually tiny, and correctness
# is more important than caching stale prompt snapshots.

acp_prompt_messages = [
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: acp_prompt_messages is rebuilt by scanning all state.events up to three times per loop iteration (lines ~1057, ~1142, and ~1243). The comment on the previous line notes "the list is usually tiny" but for long conversations (many tool-call events) this is O(n) per scan, O(n²) over many iterations.

Consider maintaining a running slice index alongside last_acp_prompt_user_message_id — or at minimum deduplicate the three scans into a single helper that returns both the slice and the latest ID — so each iteration stays O(1) rather than O(n).

"""
state = conversation.state

if self._restart_session_on_next_turn:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: _restart_session_after_drain_timeout is called unconditionally before the try: block, meaning any exception from init_state (e.g., ACP server unavailable on restart) bypasses the cancellation/timeout exception handlers and propagates directly to arun()'s except Exception clause, transitioning the conversation to ERROR.

That terminal fallback is acceptable, but worth a brief comment here so future readers know the ERROR outcome from a failed drain-timeout restart is intentional and that the handlers below will never see it.

async def send_message(self, message: Message, run: bool = False):
if not self._conversation:
raise ValueError("inactive_service")
explicit_interrupt_generation = self._explicit_interrupt_generation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The generation snapshot is correctly captured before the first await, but _mark_running_acp_prompt_superseded is itself an async call that can yield. If a user Stop/Pause arrives after _mark_running_acp_prompt_superseded returns but before the if did_mark_acp_prompt_superseded: branch fires, the supersede marker will already be set in state.agent_state. The downstream check at line 451 (self._explicit_interrupt_generation != explicit_interrupt_generation) catches the generation bump and returns — but the marker in agent_state stays set.

arun() reads and pops ACP_SUPERSEDE_INFLIGHT_PROMPT in its CancelledError handler, so the stale marker is cleaned up. Just worth confirming that path is exercised by the tests for double-cancel scenarios (looks like test_astep_double_cancel_during_drain_restarts_next_turn covers the agent side; an event-service-level test for this exact interleaving would add extra confidence).

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the PR’s primary ACP async-turn behavior end-to-end with a real EventService instance and a subprocess ACP provider: the new message is persisted promptly, the in-flight ACP turn is cancelled/restarted, ACP string chunks stream as deltas, and the replacement prompt finishes.

Does this PR achieve its stated goal?

Yes. The PR set out to make ACP conversations accept user messages during long async turns, stream ACP assistant text through transient deltas, and cancel/restart an in-flight ACP prompt so the new message is processed instead of staying stuck in sending. On main, the same user flow blocked for >4s, emitted no streaming deltas, did not send session/cancel, and never sent the second message to the ACP provider; on this PR, the second user message was visible in conversation state after 0.2s, send_message(run=True) completed in 2.274s, the fake ACP provider received cancel, the second prompt was sent, streaming deltas arrived for both prompts, and the conversation reached FINISHED.

Phase Result
Environment Setup uv run rebuilt/imported local packages successfully; no tests/linters were run.
CI Status 🟡 Snapshot: 20 successful, 8 pending, 4 skipped, 0 failing checks.
Functional Verification ✅ Before/after harness reproduced the stuck baseline and verified ACP live-message + delta behavior on the PR.
Functional Verification

Test 1: ACP user message during an in-flight async turn

I used temporary QA harnesses outside the repository (/tmp/qa_fake_acp_agent.py and /tmp/qa_acp_event_service.py). The fake ACP agent runs as a real subprocess over ACP stdio, emits AgentMessageChunk text, blocks the first prompt until session/cancel, then finishes when it receives second task. The client side uses the real EventService, LocalWorkspace, ACPAgent, subscription path, send_message(run=True), and conversation state.

Step 1 — Establish baseline without this PR:
Ran git checkout --detach origin/main && uv run python /tmp/qa_acp_event_service.py:

{
  "cancel_count": 0,
  "commit": "0c3c3e45",
  "final_status_before_close": "ConversationExecutionStatus.PAUSED",
  "first_delta_seen": false,
  "prompt2_seen": false,
  "prompt_starts": [
    {"count": 1, "text": "first task"}
  ],
  "second_message_visible_after_0_2s": false,
  "second_send_completed_within_4s": false,
  "second_send_elapsed_seconds": 4.205,
  "streaming_deltas": []
}

This confirms the baseline problem: while the first ACP prompt is in flight, the second user message is not persisted quickly, send_message(run=True) remains blocked past 4 seconds, no ACP session/cancel is sent, no replacement prompt reaches the provider, and ACP string chunks do not stream as StreamingDeltaEvents.

Step 2 — Apply the PR's changes:
Checked out codex/acp-live-message-deltas at 8b59878f0fc0fb6799ed5003e38ee5ec67b5cddf and reran the same harness.

Step 3 — Re-run with the fix in place:
Ran uv run python /tmp/qa_acp_event_service.py:

{
  "cancel_count": 1,
  "commit": "8b59878f",
  "final_status_before_close": "ConversationExecutionStatus.FINISHED",
  "first_delta_seen": true,
  "prompt2_seen": true,
  "prompt_starts": [
    {"count": 1, "text": "first task"},
    {"count": 1, "text": "second task"}
  ],
  "second_message_visible_after_0_2s": true,
  "second_send_completed_within_4s": true,
  "second_send_elapsed_seconds": 2.274,
  "streaming_deltas": [
    "delta:1:first task",
    "delta:1:second task"
  ]
}

This verifies the changed behavior works through the real server/conversation path: the second message is persisted promptly while the ACP turn is still running, the in-flight prompt is interrupted via ACP cancel/restart, the replacement prompt is processed, ACP plain-string chunks are published as transient streaming deltas, and the conversation finishes.

Test 2: Settings-based condenser LLM usage ID

Ran a small SDK script through the public settings API on the PR branch:

uv run python -c "from openhands.sdk.settings.model import default_agent_settings, create_agent_from_settings; agent=create_agent_from_settings(default_agent_settings()); [print(i, llm.model, getattr(llm, 'usage_id', None)) for i, llm in enumerate(list(agent.get_all_llms()))]"

Observed:

0 gpt-5.5 default
1 gpt-5.5 condenser

This confirms settings-created agents expose a distinct condenser LLM usage ID instead of colliding with the default agent LLM usage bucket.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants