Skip to content

Move model download to a foreground WorkManager worker#22

Merged
ivan-digital merged 3 commits into
mainfrom
feat/model-download-worker
May 10, 2026
Merged

Move model download to a foreground WorkManager worker#22
ivan-digital merged 3 commits into
mainfrom
feat/model-download-worker

Conversation

@ivan-digital
Copy link
Copy Markdown
Contributor

@ivan-digital ivan-digital commented May 9, 2026

Stacked on #21. Fixes the issue where backgrounding the demo app
mid-download visibly stalls the model fetch.

The issue

ModelManager.ensureModels(...) was called inside the activity's
lifecycleScope (MainActivity.kt:211, DictationActivity.kt:170).
When the user backgrounded the app:

  1. Activity hits onStop; eventually destroyed if Android needs RAM
  2. lifecycleScope cancels the coroutine
  3. OkHttp stream closes mid-byte
  4. Partial .tmp files persist, so on next launch ensureModels
    resumes from the saved offset (ModelManager.kt:140-142) — but
    only after the user re-enters the app

This works, but feels broken: a 1.2 GB first install pauses
indefinitely the moment the screen is off.

The fix (3 commits)

1. ffbeeb6 Move model download to a foreground worker

A CoroutineWorker that wraps ensureModels and runs as a foreground
service:

  • setForeground(...) with FOREGROUND_SERVICE_TYPE_DATA_SYNC (API 34+ requirement)
  • Progress notification (Speech models — <file> 7/16) with PRIORITY_LOW and setOnlyAlertOnce(true)
  • Per-file progress via setProgress(workDataOf(...)) so observers can drive the UI without parsing the notification
  • Returns the resolved modelDir in outputData
  • IOExceptionResult.retry(), anything else → Result.failure() with the message in outputData
  • enqueueUniqueWork("audio.soniqo.speech.modelDownload", KEEP, ...) so concurrent enqueues are deduped

The activities just observe state via getWorkInfoByIdLiveData and
proceed to initPipeline(modelDir) on SUCCEEDED. No more direct
ensureModels call from the foreground.

2. 562ae25 Runtime fixes from emulator verification

Two issues surfaced when actually running the worker on an arm64 emulator:

  • API 34+ manifest mismatch. WorkManager 2.9.x's bundled manifest declares SystemForegroundService without a foregroundServiceType. When we call startForeground(FOREGROUND_SERVICE_TYPE_DATA_SYNC), Android rejects with IllegalArgumentException because the service element doesn't advertise a matching type. Fix: declare an override in the SDK manifest with tools:replace="android:foregroundServiceType" so consumers don't have to.
  • JobScheduler CONNECTIVITY constraint stuck. NetworkType.CONNECTED maps to a JobInfo network request requiring the VALIDATED capability — which JobScheduler can hold unsatisfied for long periods on flaky / captive networks even when the device has working internet. Drop the constraint; OkHttp surfaces network failures as IOException which the worker already retries.

3. ff89a10 Robolectric tests for the worker (6 tests)

Using TestListenableWorkerBuilder + mockkObject(ModelManager) so
the worker contract is exercised on the JVM without network or disk:

  • doWork_success_returnsModelDirInOutputData
  • doWork_ioException_returnsRetry — pins the transient-failure path
  • doWork_genericThrowable_returnsFailureWithMessage — pins KEY_ERROR shape
  • doWork_invalidPrecisionInput_defaultsToInt8
  • doWork_missingPrecisionInput_defaultsToInt8
  • request_buildsRequestWithPrecisionInputDataAndNoNetworkConstraint

Files

Path Change
sdk/.../ModelDownloadWorker.kt New CoroutineWorker (~150 LOC)
sdk/.../ModelDownloadWorkerTest.kt 6 new Robolectric tests
sdk/build.gradle.kts + work-runtime-ktx:2.9.1, + core-ktx:1.13.1, + work-testing (test)
sdk/src/main/AndroidManifest.xml + FOREGROUND_SERVICE / FOREGROUND_SERVICE_DATA_SYNC perms; SystemForegroundService foregroundServiceType="dataSync" override
app/.../MainActivity.kt loadPipeline → enqueue worker; new initPipeline(modelDir)
app/.../DictationActivity.kt Same restructure
app/build.gradle.kts + work-runtime-ktx (app needs WorkManager types on its compile classpath)
app/src/main/AndroidManifest.xml + POST_NOTIFICATIONS

Test plan

  • ./gradlew :sdk:testDebugUnitTest — 26/26 pass (15 ModelManager + 5 SpeechRecognitionService + 6 ModelDownloadWorker)
  • ./gradlew :sdk:assembleDebug — green
  • ./gradlew :app:assembleDebug — green
  • End-to-end on arm64 emulator: install demo, launch Echo mode, press HOME mid-download, watch the foreground-service notification persist and the download keep advancing. Verified — pulled the full 1.1 GB entirely while the app was backgrounded.
  • Manual on real device: kick off a download, force-quit the app process (adb shell am force-stop ...), relaunch — should resume from the last .tmp byte.

Notes

@ivan-digital ivan-digital force-pushed the test/recognition-service-coverage branch from 40d78b8 to 5a323d6 Compare May 10, 2026 09:06
@ivan-digital ivan-digital changed the base branch from test/recognition-service-coverage to feat/recognition-service May 10, 2026 09:06
@ivan-digital ivan-digital force-pushed the feat/model-download-worker branch from ff89a10 to 0005370 Compare May 10, 2026 09:08
ivan-digital pushed a commit that referenced this pull request May 10, 2026
… retries

Two existing deletions were destroying the partial-download state we want
to keep:

1. ensureModels() opens by walking the models dir and deleting every
   .tmp. The intent was to clean up after process crashes, but it also
   nukes the in-progress .tmp from a previous ModelDownloadWorker
   invocation that returned Result.retry() — meaning every WorkManager
   retry restarted that file from byte 0. Range resume can't help if
   there's nothing on disk to resume from. Replace with a comment
   explaining why we keep them; stale .tmp from an old MODEL_VERSION is
   still wiped by the version-mismatch path above.

2. downloadFile() deleted the .tmp file when its 5-attempt retry loop
   was exhausted before throwing. Same problem: a transient network
   failure that outlasts those 5 in-loop retries should still leave
   resumable bytes for the worker's next try. Drop the deletion.

Net effect: a 1.2 GB download that hits a flaky network now keeps every
byte that made it to disk — observed during emulator verification of
PR #22 where a worker retry dropped ~2.5 MB of partial parakeet-encoder
bytes for no good reason.

Test: invert 'cleans up tmp file after all retries fail' to assert the
.tmp persists after exhaustion, using DISCONNECT_DURING_RESPONSE_BODY to
get a realistic mid-stream failure path.
Ivan added 3 commits May 10, 2026 12:22
ModelManager.ensureModels was called from lifecycleScope in the demo
activities, so the 1.2 GB initial download was bound to the foreground
Activity. Backgrounding the app cancelled the coroutine mid-stream;
partial files were retained (the existing Range: resume logic in
ModelManager handles that), but progress visibly stalled and any
fresh utterance had to start from the byte where the user left off.

Add ModelDownloadWorker — a CoroutineWorker that wraps ensureModels,
calls setForeground() with a progress notification (FOREGROUND_SERVICE_
TYPE_DATA_SYNC on API 34+), reports per-file progress via setProgress,
and returns the resolved model directory in outputData.

MainActivity / DictationActivity now enqueue the worker via
WorkManager.enqueueUniqueWork(KEEP) and observe state through
getWorkInfoByIdLiveData. Existing Activity-driven UI updates (status
text, progress bar) wire up to the worker's progress instead.

Also pulls in androidx.work:work-runtime-ktx and androidx.core:core-ktx
in :sdk, the same work-runtime in :app, and POST_NOTIFICATIONS in the
demo manifest.

Verified locally: ./gradlew :sdk:testDebugUnitTest — 20/20 pass
(no behavioral change to ModelManager itself).
Two issues surfaced during on-emulator verification of the worker:

1. WorkManager 2.9.x's bundled manifest declares SystemForegroundService
   without a foregroundServiceType. On API 34+, startForeground() with
   FOREGROUND_SERVICE_TYPE_DATA_SYNC then fails with
   IllegalArgumentException 'foregroundServiceType 0x00000001 is not a
   subset of foregroundServiceType attribute 0x00000000 in service
   element of manifest file'. Override the service entry in the SDK
   manifest with android:foregroundServiceType="dataSync" + tools:replace
   so the worker is compatible everywhere without forcing consumers to
   bump WorkManager.

2. NetworkType.CONNECTED maps to a JobInfo network request that requires
   the VALIDATED capability — which JobScheduler may hold unsatisfied
   for long periods on flaky networks (captive portal probe failure,
   transient DNS), even when the device has working internet. Drop the
   constraint entirely; OkHttp surfaces network failures as IOException
   which the worker already translates into Result.retry().
Six tests using TestListenableWorkerBuilder with mockkObject(ModelManager)
so the worker contract can be exercised on the JVM without touching the
network or the file system:

- doWork_success_returnsModelDirInOutputData
- doWork_ioException_returnsRetry — pins the transient-failure path
- doWork_genericThrowable_returnsFailureWithMessage — pins KEY_ERROR shape
- doWork_invalidPrecisionInput_defaultsToInt8 — guards against bad input
- doWork_missingPrecisionInput_defaultsToInt8
- request_buildsRequestWithPrecisionInputDataAndNoNetworkConstraint —
  pins the no-CONSTRAINT_CONNECTIVITY decision

Adds androidx.work:work-testing:2.9.1 to :sdk testImplementation.

Local: ./gradlew :sdk:testDebugUnitTest — 26/26 pass (6 new + 5 service
+ 15 ModelManager).
@ivan-digital ivan-digital force-pushed the feat/model-download-worker branch from 0005370 to 9b44fd4 Compare May 10, 2026 10:23
ivan-digital pushed a commit that referenced this pull request May 10, 2026
… retries

Two existing deletions were destroying the partial-download state we want
to keep:

1. ensureModels() opens by walking the models dir and deleting every
   .tmp. The intent was to clean up after process crashes, but it also
   nukes the in-progress .tmp from a previous ModelDownloadWorker
   invocation that returned Result.retry() — meaning every WorkManager
   retry restarted that file from byte 0. Range resume can't help if
   there's nothing on disk to resume from. Replace with a comment
   explaining why we keep them; stale .tmp from an old MODEL_VERSION is
   still wiped by the version-mismatch path above.

2. downloadFile() deleted the .tmp file when its 5-attempt retry loop
   was exhausted before throwing. Same problem: a transient network
   failure that outlasts those 5 in-loop retries should still leave
   resumable bytes for the worker's next try. Drop the deletion.

Net effect: a 1.2 GB download that hits a flaky network now keeps every
byte that made it to disk — observed during emulator verification of
PR #22 where a worker retry dropped ~2.5 MB of partial parakeet-encoder
bytes for no good reason.

Test: invert 'cleans up tmp file after all retries fail' to assert the
.tmp persists after exhaustion, using DISCONNECT_DURING_RESPONSE_BODY to
get a realistic mid-stream failure path.
Base automatically changed from feat/recognition-service to main May 10, 2026 16:11
@ivan-digital ivan-digital merged commit 0bc4d38 into main May 10, 2026
@ivan-digital ivan-digital deleted the feat/model-download-worker branch May 10, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant