Skip to content

fix(mobile/probe): widen probe1 timeout 8s → 20s for Maestro v2 cold-start#358

Merged
jjackson merged 1 commit into
mainfrom
emdash/avd-yt6cu
May 19, 2026
Merged

fix(mobile/probe): widen probe1 timeout 8s → 20s for Maestro v2 cold-start#358
jjackson merged 1 commit into
mainfrom
emdash/avd-yt6cu

Conversation

@jjackson
Copy link
Copy Markdown
Owner

Summary

The malaria-itn-app/20260517-1829 Phase 6 driver-unhealthy failures trace to probe1's 8s shell budget being below Maestro v2.3.0's JVM cold-start floor (~10-12s steady-state on a healthy AVD with Java 17). probe1 shell-times-out on a working driver, triggers Stage 2's destructive uninstall+reinstall, which then exposes the pre-existing install-race class as UNAVAILABLE: io exception on probe2.

  • mcp/mobile/client.ts — probe1 budget 8_00020_000 (the actual fix)
  • mcp/mobile/backends/maestro.ts — comment refresh confirming v1.39 + v2.3 share driver APK layout
  • bin/ace-doctor — WARN (not BLOCKER) when local Maestro starts with 1. so operators on v1.x are nudged to install v2, while cloud-backend operators with a v1-bundled AMI keep working
  • test/mcp/mobile/client.test.ts — update probe-call budget expectations to 20_000
  • docs/learnings/2026-05-19-maestro-v2-probe-timeout.md — full trace + ruled-out hypotheses

What the brief asked vs. what evidence supported

The brief framed this as "Maestro v2 broke the gRPC driver probe; drop v1 support entirely." Direct empirical testing on Maestro v2.3.0 against a running AVD ruled out every specific v2-protocol claim:

Brief claim Evidence
Driver APK layout changed in v2 unzip -l shows maestro-app.apk + maestro-server.apk at root, same as v1.39
--host/--port flags broken maestro --host=localhost --port=5557 hierarchy returns exit 0 with full hierarchy JSON
gRPC contract changed Same io.grpc.StatusRuntimeException: UNAVAILABLE class as the pre-existing install-race (PRs #339/341/342)
Recipe syntax broken ALLOWED_STEP_KEYS audited against v2.0–v2.3 CHANGELOG; no removals affect our palette

The actual fix is a one-line timeout calibration. The bin/ace-doctor Maestro-version advisory and the comment refresh capture the v2 standardization intent without forcing a flag-day.

Cloud AMI

The cloud AMI's bundled Maestro CLI (~1.36) keeps working — same wire-protocol, same recipe semantics — until rebaked. AMI rebake is tracked separately; the packer config wasn't found in jjackson/ace or jjackson/ace-web, and may live under the acedimagi Mac login or on AWS. The doctor warn is WARN-level (not BLOCKER) to keep cloud-backend operators with v1-bundled AMI unblocked.

Test plan

  • npm test — 1238 pass, 1 unrelated XML-patch test flaked at 5s timeout (no connection to mobile)
  • npx vitest run test/mcp/mobile/client.test.ts — 51 pass, 8 skipped
  • bin/ace-doctor[Mobile] block shows Maestro 2.3.0 cleanly (no WARN since we're on v2)
  • Live reproduction confirmed: maestro --host=localhost --port=5557 hierarchy against running AVD returns exit 0 with full hierarchy JSON in ~10-12s
  • Next dispatch of app-screenshot-capture will exercise the widened probe1 against a real cold-boot

🤖 Generated with Claude Code

…start

The malaria-itn-app/20260517-1829 Phase 6 driver-unhealthy failures
trace to probe1's 8s shell budget being below v2.3.0's JVM cold-start
floor (~10-12s steady-state on a healthy AVD with Java 17). probe1
then shell-times-out on a working driver, triggering Stage 2's
destructive uninstall+reinstall, which exposes the pre-existing
install-race class as UNAVAILABLE on probe2.

v1.39 and v2.3 are wire-compatible for our usage — same driver APK
layout, same gRPC contract, same `--host`/`--port` flags, same step
keys. The fix is purely a timeout calibration; no v1/v2 toggle, no
ALLOWED_STEP_KEYS churn. doctor warns (not errors) on Maestro <2.0
so cloud-backend operators with v1-bundled AMI keep working until
the AMI is rebaked separately.

See docs/learnings/2026-05-19-maestro-v2-probe-timeout.md for the
full trace + ruled-out hypotheses (the brief's "v2 broke the gRPC
probe" framing was invalidated by direct empirical testing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jjackson jjackson merged commit e9e2a00 into main May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant