Add Android RecognitionService for system-wide voice input by ivan-digital · Pull Request #19 · soniqo/speech-android

ivan-digital · 2026-04-11T19:18:51Z

Summary

A new SpeechRecognitionService (in audio.soniqo.speech.service) wrapping SpeechPipeline so any app using the SpeechRecognizer API (Gboard, Duolingo, the system voice-input picker) can invoke fully on-device STT.

This PR also absorbs what was originally split into PR #21 (interface refactor + Robolectric tests). Replacement for #21 — close it once this lands.

Service contract

Owns its own AudioRecord (VOICE_RECOGNITION, 16 kHz, PCM_FLOAT) — callers do not push audio.
Event mapping: SpeechStarted → beginningOfSpeech, PartialTranscription → partialResults, TranscriptionCompleted → results (session ends), SpeechEnded → endOfSpeech, Error → error(ERROR_SERVER).
Honors EXTRA_PARTIAL_RESULTS by wiring emitPartialTranscriptions on the pipeline.
EXTRA_LANGUAGE is logged but not enforced — Parakeet TDT v3 auto-detects.

Service polish (added during review)

onCheckRecognitionSupport (API 33+). Returns a RecognitionSupport with our SUPPORTED_LANGUAGES (27 BCP-47 tags from Parakeet TDT v3) marked installed-on-device when ModelManager.areModelsReady() is true, pending otherwise. Lets callers surface a "downloading models" UX instead of silently falling back to an online recognizer.
Audio focus management. Acquires AUDIOFOCUS_GAIN_TRANSIENT with USAGE_VOICE_COMMUNICATION when a session starts, abandons on tear down. On AUDIOFOCUS_LOSS / LOSS_TRANSIENT the listener tears down the session — yields the mic to incoming calls and nav prompts. Best-effort: a denied focus request logs and proceeds.
Settings activity (in the demo app). Without it, the gear icon next to the recognizer in the system Voice-input picker is greyed out. Currently informational — shows model-readiness state and the supported-language list. Wired via android:settingsActivity in recognition_service.xml and a RECOGNIZER_INTENT filter in the demo manifest.
ModelManager.areModelsReady() public API. Synchronous, side-effect-free check used by onCheckRecognitionSupport and the settings activity.

Bug fixes pinned by tests

Stop-hang. onStopListening cut the mic without flushing → VAD never saw silence → TranscriptionCompleted never fired. Now pushes ~1 s of zero frames after cancelling the mic job.
Start-race. The busy check happened before launching the suspending setup, so two concurrent starts both passed the gate. Now claims an AtomicBoolean synchronously.

Refactor + Robolectric coverage

SpeechPipeline becomes an interface with SpeechPipelineImpl; companion invoke keeps SpeechPipeline(config) working at every existing call site (demo + androidTest).
Service opened for subclassing with three protected seams: createPipeline, resolveModelDir, newAudioRecord.
8 Robolectric tests exercising the contract end-to-end on the JVM in <1 s each:

Test	What it pins
`startListening_setsUpPipelineAndSignalsReady`	Happy path — `readyForSpeech` fires after pipeline init
`startListening_concurrentCallReturnsBusy`	Regression — start-race fix
`stopListening_flushesPipelineWithSilence`	Regression — stop-hang fix (~30 zero-frame chunks)
`startListening_withoutPermission_reportsInsufficient`	Permission-denied path
`transcriptionCompleted_emitsResultsAndTearsDownSession`	Final event delivers `results(...)` and closes the pipeline
`startListening_requestsAudioFocus`	Audio-focus request goes out at session start
`audioFocusLoss_tearsDownSession`	`AUDIOFOCUS_LOSS` callback closes the pipeline
`onCheckRecognitionSupport_modelsNotReady_marksLanguagesPending`	API 33+ language-support path returns the right shape

Tests use a TestableService subclass that overrides the seams, a FakeSpeechPipeline implementing the new interface, and a MockK-mocked AudioRecord.

Scope

Out of scope (deferred):

A true language hint to STT — would need an API change in SpeechConfig / parakeet_stt.cpp. Tracked separately.
The legacy BroadcastReceiver for RecognizerIntent.ACTION_GET_LANGUAGE_DETAILS (pre-API-33 language discovery). minSdk is 26 but the modern onCheckRecognitionSupport path covers the dominant case; can add the receiver later if real-world demand surfaces.

Closes #4.

Test plan

./gradlew :sdk:assembleDebug :app:assembleDebug — green
./gradlew :sdk:testDebugUnitTest — 23/23 pass (8 service + 15 ModelManager)
./gradlew :sdk:connectedDebugAndroidTest — 34/34 pass on arm64 emulator (verifies the SpeechPipeline interface refactor doesn't break the existing pipeline tests)
Installed demo on arm64 emulator; service registered (dumpsys package audio.soniqo.speech.demo):
- android.speech.RecognitionService filter on SpeechRecognitionService with RECORD_AUDIO permission
- android.speech.action.RECOGNIZER_INTENT filter on SpeechRecognitionSettingsActivity
Set as system default: settings put secure voice_recognition_service audio.soniqo.speech.demo/audio.soniqo.speech.service.SpeechRecognitionService — readback confirmed
Settings activity launches via am start -a android.speech.action.RECOGNIZER_INTENT — renders title, model-readiness state, and the 27-language list (screenshot in PR thread)
Manual: open Gboard in any text field, tap mic, speak — verify transcription comes back from our service
Manual: Settings → System → Languages & input → Voice input picker — confirm our service appears with the gear icon enabled

Notes

This PR replaces Robolectric coverage for SpeechRecognitionService #21 (cherry-picked its commits). Robolectric coverage for SpeechRecognitionService #21 should be closed when this lands.
Stacked PRs Move model download to a foreground WorkManager worker #22 and Worker rollout follow-ups: WM 2.11, .tmp resume, service uses worker #23 will rebase onto this once it merges.

Exposes on-device STT via the standard android.speech.RecognitionService API so keyboards and apps (Gboard, Duolingo, etc.) can use the pipeline system-wide. The demo APK registers the service; users can pick it as the default voice input under Settings → System → Languages & input. Closes #4

onStopListening previously cancelled the mic feed without pushing any audio to the pipeline, so VAD never saw silence and the final TranscriptionCompleted never fired. After cutting the mic, push ~1 s of zero frames so the pipeline finalizes and the caller gets results. onStartListening only checked session != null before launching the suspending setup, so two concurrent starts could both pass the gate and race to assign session, leaking an AudioRecord and pipeline. Claim an AtomicBoolean synchronously and reject duplicates with ERROR_RECOGNIZER_BUSY.

Refactor SpeechPipeline to an interface with an internal SpeechPipelineImpl backed by NativeBridge. The factory `SpeechPipeline(config)` is preserved via a companion `invoke` so all existing call sites in the demo app and androidTest suite are unchanged. Open SpeechRecognitionService for test subclassing and extract three protected seams — createPipeline, resolveModelDir, newAudioRecord — so JVM unit tests can run without loading the .so or opening the mic. Add Robolectric + MockK and five tests covering the two bugs we fixed in the previous commit (busy-race, stop-hang) plus permission denial, ready-for-speech signaling, and TranscriptionCompleted teardown.

RecognitionService.onStartListening and onStopListening are protected in the Android SDK, so tests outside the inheritance chain cannot call them. Add startListening() / stopListening() public wrappers on TestableService that delegate to the protected callbacks. Verified locally: ./gradlew :sdk:testDebugUnitTest — 20/20 pass (5 new SpeechRecognitionServiceTest, 15 existing ModelManagerDownloadTest).

Synchronous, side-effect-free check that every required model file for the given precision is on disk and passes isValidModel(). Used by paths that must answer 'are we ready?' without blocking, in particular SpeechRecognitionService.onCheckRecognitionSupport(), which has to tell the framework whether on-device recognition is currently available.

…vice Three additions that round out the RecognitionService contract: 1. **Audio focus management.** Acquire AUDIOFOCUS_GAIN_TRANSIENT with USAGE_VOICE_COMMUNICATION when a session starts, abandon when it tears down. On AUDIOFOCUS_LOSS / LOSS_TRANSIENT the listener tears down the session — yielding the mic to incoming calls and nav prompts is the right behavior, and we don't currently support pause/resume mid-utterance anyway. Best-effort: a denied focus request logs and proceeds. 2. **onCheckRecognitionSupport (API 33+).** Override the framework hook that tells callers (Gboard etc.) which BCP-47 languages we can recognize and whether they're installed-on-device or pending download. Built off ModelManager.areModelsReady() — installed when models are present, pending otherwise. Lets the caller surface a 'downloading models' UX rather than silently falling back to an online recognizer. 3. **SUPPORTED_LANGUAGES constant.** A representative subset of the languages Parakeet TDT v3 claims (ar, cs, da, de, el, en, es, fi, fr, he, hi, hu, id, it, ja, ko, nb, nl, pl, pt, ru, sv, th, tr, uk, vi, zh — 27). Public on the companion object so apps can mirror it in their own settings UI. Tests: three new Robolectric tests covering audio-focus request, audio-focus loss → teardown, and onCheckRecognitionSupport's pending state. + androidx.annotation:annotation:1.8.2 for @RequiresApi. Local: ./gradlew :sdk:testDebugUnitTest — 23/23 pass (8 service + 15 ModelManager).

The settings entry the system Voice-input picker (Settings → System → Languages & input → Voice input) opens via the gear icon next to our recognizer. Without this, the gear is greyed out and users can't tell the recognizer is alive / configurable. Currently informational only — shows model-readiness state and the SDK's SUPPORTED_LANGUAGES list. Nothing user-tunable yet. Wired into recognition_service.xml via android:settingsActivity and declared in the demo manifest with the RECOGNIZER_INTENT intent filter that the picker queries for.

Android 15 / One UI 8 forces edge-to-edge layouts by default. Without inset handling the bottom mic button slides under the gesture-nav bar on Galaxy devices (and the system-bar overlap on the top), making the button untappable. Wire ViewCompat.setOnApplyWindowInsetsListener on each Activity's root LinearLayout to pad by the system-bar insets: - MainActivity.buildUI() — Echo mode mic at the bottom - DictationActivity.buildUI() — Dictation mic at the bottom - SpeechRecognitionSettingsActivity.onCreate() — Settings entry that the system Voice-input picker opens (preserves existing 64/96 padding and adds inset padding on top) No SDK change. Pure demo-app fix.

Adds a third demo entry (Recognizer test) that calls SpeechRecognizer.createSpeechRecognizer(ctx) without a ComponentName, exercising the system-default voice recognition service path end-to-end through the binder boundary. Useful for smoke-testing the recognition service without going through Gboard or Samsung Keyboard (both of which bypass the system default). README gains a new "System voice input (RecognitionService)" section with a 4-step setup: manifest registration (including RECORD_AUDIO uses-permission and the @xml/recognition_service resource readers would otherwise be missing), system default selection via Settings or adb, and verification via the new test screen. Mirrored into all 9 translations.

…h-core speech-core PRs #19 and #20 lifted all the model wrappers, audio utilities, and Linux examples out of this repo. This PR finishes the migration by deleting the now-duplicated source and slimming the native side to a single ~250-line JNI bridge. Net change: 51 files, +717 / -7412. Bumped: - speech-core submodule pointer: 679869d → ba75579 (PR #19 + #20 merged) Deleted (now in speech-core): - sdk/src/main/cpp/audio/ — fft, mel, stft (live at speech_core::audio) - sdk/src/main/cpp/util/ — json.h - sdk/src/main/cpp/models/ — silero_vad, parakeet_stt, kokoro_tts + phonemizer + multilingual, deepfilter, onnx_engine, inference_engine, onnx_backend, soc_detect - linux/ — moved verbatim to speech-core/examples/linux/ (libspeech.so, demo, CLIs, integration test) Rewrote: - sdk/src/main/cpp/jni_bridge.cpp (388 → 269 lines) — the model wrappers in speech_core::* directly implement VADInterface / STTInterface / TTSInterface / EnhancerInterface, so the 100+ lines of C-vtable adapter boilerplate (vad_process_chunk, stt_transcribe, tts_synthesize, etc.) that wrapped each model class into sc_*_vtable_t structs are gone. The bridge now constructs speech_core::SileroVad / ParakeetStt / KokoroTts and hands references to speech_core::VoicePipeline. - sdk/src/main/cpp/CMakeLists.txt — replaced the manual list of speech-core source files with add_subdirectory(${SPEECH_CORE_DIR}) using SPEECH_CORE_WITH_ONNX=ON. Link speech_android against speech_core_models. Compatibility: - Kotlin contract unchanged. NativeBridge.onEvent still receives the same int event-type values (0..11). The new speech_core::EventType enum has ResponseDone and ResponseAudioDelta swapped relative to the old C ABI (sc_event_t.type) — added to_kotlin_event() to map explicitly so the Kotlin side keeps working without any change. - Public Kotlin API (SpeechPipeline, SpeechConfig, SpeechEvent) untouched. Docs: - README.md rewritten as Android-only (Linux/Yocto/QNN sections moved to a one-line cross-link pointing at speech-core/examples/linux). - All 9 README translations updated to mirror the new structure (zh, ja, ko, es, de, fr, hi, pt, ru) with existing high-quality translations preserved where the underlying English text is unchanged. - AGENTS.md rewritten — Android-only scope, points contributors at speech-core for any C++ / model / Linux changes. - .gitignore drops the linux/tests/models/ and /ort-linux/ entries that are no longer relevant. - setup.sh trimmed to just the Android ORT download + submodule init (it was previously rewriting the .gitignore on every invocation). Verified locally: - ./gradlew :sdk:externalNativeBuildDebug — BUILD SUCCESSFUL, 5.6 MB libspeech_android.so produced for arm64-v8a, links libonnxruntime.so and libc++_shared.so cleanly. - ./gradlew :sdk:assembleDebug :sdk:test — BUILD SUCCESSFUL, 77 tasks. Next: connectedAndroidTest needs to run on an emulator (downloads 1.2 GB of models on first run); will run that in CI rather than locally.

Slim speech-android to Android-only after speech-core PRs #19/#20

ivan-digital and others added 2 commits April 11, 2026 21:18

This was referenced May 8, 2026

Robolectric coverage for SpeechRecognitionService #21

Closed

Move model download to a foreground WorkManager worker #22

Merged

Ivan added 6 commits May 10, 2026 10:56

ivan-digital mentioned this pull request May 10, 2026

TTS debug CLIs (transcribe / synthesize / phonemize) + Kokoro post-processing #24

Merged

6 tasks

ivan-digital merged commit 256b4ba into main May 10, 2026

ivan-digital deleted the feat/recognition-service branch May 10, 2026 16:11

ivan-digital mentioned this pull request May 13, 2026

Slim speech-android to Android-only after speech-core PRs #19/#20 #28

Merged

5 tasks

ivan-digital added a commit that referenced this pull request May 13, 2026

Merge pull request #28 from soniqo/feat/android-only

6b0c70b

Slim speech-android to Android-only after speech-core PRs #19/#20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Android RecognitionService for system-wide voice input#19

Add Android RecognitionService for system-wide voice input#19
ivan-digital merged 9 commits into
mainfrom
feat/recognition-service

ivan-digital commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivan-digital commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Service contract

Service polish (added during review)

Bug fixes pinned by tests

Refactor + Robolectric coverage

Scope

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ivan-digital commented Apr 11, 2026 •

edited

Loading