fix(mobile/probe): widen probe1 timeout 8s → 20s for Maestro v2 cold-start#358
Merged
Conversation
…start The malaria-itn-app/20260517-1829 Phase 6 driver-unhealthy failures trace to probe1's 8s shell budget being below v2.3.0's JVM cold-start floor (~10-12s steady-state on a healthy AVD with Java 17). probe1 then shell-times-out on a working driver, triggering Stage 2's destructive uninstall+reinstall, which exposes the pre-existing install-race class as UNAVAILABLE on probe2. v1.39 and v2.3 are wire-compatible for our usage — same driver APK layout, same gRPC contract, same `--host`/`--port` flags, same step keys. The fix is purely a timeout calibration; no v1/v2 toggle, no ALLOWED_STEP_KEYS churn. doctor warns (not errors) on Maestro <2.0 so cloud-backend operators with v1-bundled AMI keep working until the AMI is rebaked separately. See docs/learnings/2026-05-19-maestro-v2-probe-timeout.md for the full trace + ruled-out hypotheses (the brief's "v2 broke the gRPC probe" framing was invalidated by direct empirical testing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The malaria-itn-app/20260517-1829 Phase 6 driver-unhealthy failures trace to probe1's 8s shell budget being below Maestro v2.3.0's JVM cold-start floor (~10-12s steady-state on a healthy AVD with Java 17). probe1 shell-times-out on a working driver, triggers Stage 2's destructive uninstall+reinstall, which then exposes the pre-existing install-race class as
UNAVAILABLE: io exceptionon probe2.mcp/mobile/client.ts— probe1 budget8_000→20_000(the actual fix)mcp/mobile/backends/maestro.ts— comment refresh confirming v1.39 + v2.3 share driver APK layoutbin/ace-doctor— WARN (not BLOCKER) when local Maestro starts with1.so operators on v1.x are nudged to install v2, while cloud-backend operators with a v1-bundled AMI keep workingtest/mcp/mobile/client.test.ts— update probe-call budget expectations to 20_000docs/learnings/2026-05-19-maestro-v2-probe-timeout.md— full trace + ruled-out hypothesesWhat the brief asked vs. what evidence supported
The brief framed this as "Maestro v2 broke the gRPC driver probe; drop v1 support entirely." Direct empirical testing on Maestro v2.3.0 against a running AVD ruled out every specific v2-protocol claim:
unzip -lshowsmaestro-app.apk+maestro-server.apkat root, same as v1.39--host/--portflags brokenmaestro --host=localhost --port=5557 hierarchyreturns exit 0 with full hierarchy JSONio.grpc.StatusRuntimeException: UNAVAILABLEclass as the pre-existing install-race (PRs #339/341/342)ALLOWED_STEP_KEYSaudited against v2.0–v2.3 CHANGELOG; no removals affect our paletteThe actual fix is a one-line timeout calibration. The
bin/ace-doctorMaestro-version advisory and the comment refresh capture the v2 standardization intent without forcing a flag-day.Cloud AMI
The cloud AMI's bundled Maestro CLI (~1.36) keeps working — same wire-protocol, same recipe semantics — until rebaked. AMI rebake is tracked separately; the packer config wasn't found in
jjackson/aceorjjackson/ace-web, and may live under theacedimagiMac login or on AWS. The doctor warn is WARN-level (not BLOCKER) to keep cloud-backend operators with v1-bundled AMI unblocked.Test plan
npm test— 1238 pass, 1 unrelated XML-patch test flaked at 5s timeout (no connection to mobile)npx vitest run test/mcp/mobile/client.test.ts— 51 pass, 8 skippedbin/ace-doctor—[Mobile]block shows Maestro 2.3.0 cleanly (no WARN since we're on v2)maestro --host=localhost --port=5557 hierarchyagainst running AVD returns exit 0 with full hierarchy JSON in ~10-12sapp-screenshot-capturewill exercise the widened probe1 against a real cold-boot🤖 Generated with Claude Code