Skip to content

Worker rollout follow-ups: WM 2.11, .tmp resume, service uses worker#23

Closed
ivan-digital wants to merge 3 commits into
feat/model-download-workerfrom
feat/worker-followups
Closed

Worker rollout follow-ups: WM 2.11, .tmp resume, service uses worker#23
ivan-digital wants to merge 3 commits into
feat/model-download-workerfrom
feat/worker-followups

Conversation

@ivan-digital
Copy link
Copy Markdown
Contributor

@ivan-digital ivan-digital commented May 10, 2026

Stacked on #22. Three independent improvements to the worker rollout:

1. d1ae47d Bump androidx.work 2.9.1 → 2.11.2

Picks up bug fixes since 2.9.1. The SDK manifest still has the
SystemForegroundService foregroundServiceType override because
2.11's bundled manifest still doesn't declare one — verified by
inspecting the AAR.

2. 7643c4a Preserve .tmp files across worker retries

Two existing deletions in ModelManager were destroying partial
download state we want to keep:

Site Why it ran Why it's wrong
Top of ensureModels()dir.walk().filter{ext==tmp}.forEach{delete} "Clean up after a previous crash" Also nukes the in-progress .tmp from the previous worker invocation that returned Result.retry() — every WorkManager retry restarted that file from byte 0.
End of downloadFile() retry loop "Cleanup before throw" A transient network failure outliving 5 in-loop retries should still leave resumable bytes for the worker's next try, not throw them away.

Both are now removed. Stale .tmp from an old MODEL_VERSION is still
wiped by the version-mismatch path. Test cleans up tmp file after all retries fail is inverted to assert preservation, using
DISCONNECT_DURING_RESPONSE_BODY to exercise a realistic mid-stream
failure.

Concrete impact: during PR #22's emulator run we observed a worker
retry drop ~2.5 MB of partial parakeet-encoder bytes between
restarts. That doesn't happen anymore.

3. 99838e6 SpeechRecognitionService uses the worker

The service used to call ModelManager.ensureModels() inline from
resolveModelDir(). On a fresh install, the first Gboard mic tap was
synchronously waiting on a 1.2 GB download tied to the binder's
lifecycle.

New flow:

  1. Fast pathModelManager.areModelsReady(ctx, INT8) (new public API; duplicates the per-file validity check from ensureModels() without the download side-effect). If true, return ModelManager.modelDir(ctx) immediately.
  2. Slow path — enqueue ModelDownloadWorker and await its terminal state via getWorkInfoByIdFlow().filterNotNull().first { it.state.isFinished }. The worker runs as a foreground service so the download persists past the binder timeout and the user putting the phone in their pocket. The next mic tap takes the fast path.

The protected resolveModelDir() seam stays in place; existing
SpeechRecognitionServiceTest still overrides it.

Test plan

  • ./gradlew :sdk:testDebugUnitTest — 26/26 pass
  • ./gradlew :sdk:assembleDebug — green
  • ./gradlew :app:assembleDebug — green
  • Manual: install on emulator, set our service as default voice input (settings put secure voice_recognition_service ...), tap Gboard mic on a fresh install — observe download notification appear, dismiss Gboard, wait, re-tap — should now transcribe without re-downloading.
  • Manual: kick off a download in MainActivity, force-quit the app process (adb shell am force-stop ...) mid-download, relaunch — should resume from the last .tmp byte (verify via adb shell run-as ... ls -la files/models), not byte 0.

Notes

Ivan added 3 commits May 10, 2026 12:23
Picks up bug fixes since 2.9.1. The SystemForegroundService manifest
override stays in the SDK manifest because 2.11 still doesn't declare
foregroundServiceType in its bundled manifest.
… retries

Two existing deletions were destroying the partial-download state we want
to keep:

1. ensureModels() opens by walking the models dir and deleting every
   .tmp. The intent was to clean up after process crashes, but it also
   nukes the in-progress .tmp from a previous ModelDownloadWorker
   invocation that returned Result.retry() — meaning every WorkManager
   retry restarted that file from byte 0. Range resume can't help if
   there's nothing on disk to resume from. Replace with a comment
   explaining why we keep them; stale .tmp from an old MODEL_VERSION is
   still wiped by the version-mismatch path above.

2. downloadFile() deleted the .tmp file when its 5-attempt retry loop
   was exhausted before throwing. Same problem: a transient network
   failure that outlasts those 5 in-loop retries should still leave
   resumable bytes for the worker's next try. Drop the deletion.

Net effect: a 1.2 GB download that hits a flaky network now keeps every
byte that made it to disk — observed during emulator verification of
PR #22 where a worker retry dropped ~2.5 MB of partial parakeet-encoder
bytes for no good reason.

Test: invert 'cleans up tmp file after all retries fail' to assert the
.tmp persists after exhaustion, using DISCONNECT_DURING_RESPONSE_BODY to
get a realistic mid-stream failure path.
…rker

When Gboard binds to our service for the first time on a fresh install,
resolveModelDir() used to call ensureModels() inline — blocking the
recognition request on a 1.2 GB download tied to the bind's lifecycle.
If Gboard times out (which it will, well before the download finishes),
the coroutine cancels and the bytes that made it to disk are at the
mercy of the user re-tapping the mic.

Now:

1. Fast path — ModelManager.areModelsReady(ctx, INT8) checks file
   validity without touching the network. If true, return modelDir
   immediately and the service starts in milliseconds.
2. Slow path — enqueue ModelDownloadWorker (idempotent via
   ExistingWorkPolicy.KEEP) and await its terminal state via
   getWorkInfoByIdFlow().filterNotNull().first { it.state.isFinished }.
   The worker runs as a foreground service so the download keeps
   progressing even when Gboard's bind times out and the user puts
   the phone in their pocket. The next mic tap takes the fast path.

ModelManager: add public areModelsReady() + modelDir(). The validity
check duplicates the per-file validation that ensureModels() already
does, but exposed without the side effect of starting a download.

The protected resolveModelDir() seam stays in place; tests still
override it to bypass the worker entirely.
@ivan-digital ivan-digital force-pushed the feat/model-download-worker branch from 0005370 to 9b44fd4 Compare May 10, 2026 10:23
@ivan-digital ivan-digital force-pushed the feat/worker-followups branch from ee1ac47 to 680fdf3 Compare May 10, 2026 10:23
@ivan-digital ivan-digital deleted the branch feat/model-download-worker May 10, 2026 16:36
@ivan-digital ivan-digital deleted the feat/worker-followups branch May 10, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant