Add Android RecognitionService for system-wide voice input#19
Merged
Conversation
Exposes on-device STT via the standard android.speech.RecognitionService API so keyboards and apps (Gboard, Duolingo, etc.) can use the pipeline system-wide. The demo APK registers the service; users can pick it as the default voice input under Settings → System → Languages & input. Closes #4
onStopListening previously cancelled the mic feed without pushing any audio to the pipeline, so VAD never saw silence and the final TranscriptionCompleted never fired. After cutting the mic, push ~1 s of zero frames so the pipeline finalizes and the caller gets results. onStartListening only checked session != null before launching the suspending setup, so two concurrent starts could both pass the gate and race to assign session, leaking an AudioRecord and pipeline. Claim an AtomicBoolean synchronously and reject duplicates with ERROR_RECOGNIZER_BUSY.
This was referenced May 8, 2026
added 6 commits
May 10, 2026 10:56
Refactor SpeechPipeline to an interface with an internal SpeechPipelineImpl backed by NativeBridge. The factory `SpeechPipeline(config)` is preserved via a companion `invoke` so all existing call sites in the demo app and androidTest suite are unchanged. Open SpeechRecognitionService for test subclassing and extract three protected seams — createPipeline, resolveModelDir, newAudioRecord — so JVM unit tests can run without loading the .so or opening the mic. Add Robolectric + MockK and five tests covering the two bugs we fixed in the previous commit (busy-race, stop-hang) plus permission denial, ready-for-speech signaling, and TranscriptionCompleted teardown.
RecognitionService.onStartListening and onStopListening are protected in the Android SDK, so tests outside the inheritance chain cannot call them. Add startListening() / stopListening() public wrappers on TestableService that delegate to the protected callbacks. Verified locally: ./gradlew :sdk:testDebugUnitTest — 20/20 pass (5 new SpeechRecognitionServiceTest, 15 existing ModelManagerDownloadTest).
Synchronous, side-effect-free check that every required model file for the given precision is on disk and passes isValidModel(). Used by paths that must answer 'are we ready?' without blocking, in particular SpeechRecognitionService.onCheckRecognitionSupport(), which has to tell the framework whether on-device recognition is currently available.
…vice Three additions that round out the RecognitionService contract: 1. **Audio focus management.** Acquire AUDIOFOCUS_GAIN_TRANSIENT with USAGE_VOICE_COMMUNICATION when a session starts, abandon when it tears down. On AUDIOFOCUS_LOSS / LOSS_TRANSIENT the listener tears down the session — yielding the mic to incoming calls and nav prompts is the right behavior, and we don't currently support pause/resume mid-utterance anyway. Best-effort: a denied focus request logs and proceeds. 2. **onCheckRecognitionSupport (API 33+).** Override the framework hook that tells callers (Gboard etc.) which BCP-47 languages we can recognize and whether they're installed-on-device or pending download. Built off ModelManager.areModelsReady() — installed when models are present, pending otherwise. Lets the caller surface a 'downloading models' UX rather than silently falling back to an online recognizer. 3. **SUPPORTED_LANGUAGES constant.** A representative subset of the languages Parakeet TDT v3 claims (ar, cs, da, de, el, en, es, fi, fr, he, hi, hu, id, it, ja, ko, nb, nl, pl, pt, ru, sv, th, tr, uk, vi, zh — 27). Public on the companion object so apps can mirror it in their own settings UI. Tests: three new Robolectric tests covering audio-focus request, audio-focus loss → teardown, and onCheckRecognitionSupport's pending state. + androidx.annotation:annotation:1.8.2 for @RequiresApi. Local: ./gradlew :sdk:testDebugUnitTest — 23/23 pass (8 service + 15 ModelManager).
The settings entry the system Voice-input picker (Settings → System → Languages & input → Voice input) opens via the gear icon next to our recognizer. Without this, the gear is greyed out and users can't tell the recognizer is alive / configurable. Currently informational only — shows model-readiness state and the SDK's SUPPORTED_LANGUAGES list. Nothing user-tunable yet. Wired into recognition_service.xml via android:settingsActivity and declared in the demo manifest with the RECOGNIZER_INTENT intent filter that the picker queries for.
Android 15 / One UI 8 forces edge-to-edge layouts by default. Without inset handling the bottom mic button slides under the gesture-nav bar on Galaxy devices (and the system-bar overlap on the top), making the button untappable. Wire ViewCompat.setOnApplyWindowInsetsListener on each Activity's root LinearLayout to pad by the system-bar insets: - MainActivity.buildUI() — Echo mode mic at the bottom - DictationActivity.buildUI() — Dictation mic at the bottom - SpeechRecognitionSettingsActivity.onCreate() — Settings entry that the system Voice-input picker opens (preserves existing 64/96 padding and adds inset padding on top) No SDK change. Pure demo-app fix.
6 tasks
Adds a third demo entry (Recognizer test) that calls SpeechRecognizer.createSpeechRecognizer(ctx) without a ComponentName, exercising the system-default voice recognition service path end-to-end through the binder boundary. Useful for smoke-testing the recognition service without going through Gboard or Samsung Keyboard (both of which bypass the system default). README gains a new "System voice input (RecognitionService)" section with a 4-step setup: manifest registration (including RECORD_AUDIO uses-permission and the @xml/recognition_service resource readers would otherwise be missing), system default selection via Settings or adb, and verification via the new test screen. Mirrored into all 9 translations.
ivan-digital
pushed a commit
that referenced
this pull request
May 13, 2026
…h-core speech-core PRs #19 and #20 lifted all the model wrappers, audio utilities, and Linux examples out of this repo. This PR finishes the migration by deleting the now-duplicated source and slimming the native side to a single ~250-line JNI bridge. Net change: 51 files, +717 / -7412. Bumped: - speech-core submodule pointer: 679869d → ba75579 (PR #19 + #20 merged) Deleted (now in speech-core): - sdk/src/main/cpp/audio/ — fft, mel, stft (live at speech_core::audio) - sdk/src/main/cpp/util/ — json.h - sdk/src/main/cpp/models/ — silero_vad, parakeet_stt, kokoro_tts + phonemizer + multilingual, deepfilter, onnx_engine, inference_engine, onnx_backend, soc_detect - linux/ — moved verbatim to speech-core/examples/linux/ (libspeech.so, demo, CLIs, integration test) Rewrote: - sdk/src/main/cpp/jni_bridge.cpp (388 → 269 lines) — the model wrappers in speech_core::* directly implement VADInterface / STTInterface / TTSInterface / EnhancerInterface, so the 100+ lines of C-vtable adapter boilerplate (vad_process_chunk, stt_transcribe, tts_synthesize, etc.) that wrapped each model class into sc_*_vtable_t structs are gone. The bridge now constructs speech_core::SileroVad / ParakeetStt / KokoroTts and hands references to speech_core::VoicePipeline. - sdk/src/main/cpp/CMakeLists.txt — replaced the manual list of speech-core source files with add_subdirectory(${SPEECH_CORE_DIR}) using SPEECH_CORE_WITH_ONNX=ON. Link speech_android against speech_core_models. Compatibility: - Kotlin contract unchanged. NativeBridge.onEvent still receives the same int event-type values (0..11). The new speech_core::EventType enum has ResponseDone and ResponseAudioDelta swapped relative to the old C ABI (sc_event_t.type) — added to_kotlin_event() to map explicitly so the Kotlin side keeps working without any change. - Public Kotlin API (SpeechPipeline, SpeechConfig, SpeechEvent) untouched. Docs: - README.md rewritten as Android-only (Linux/Yocto/QNN sections moved to a one-line cross-link pointing at speech-core/examples/linux). - All 9 README translations updated to mirror the new structure (zh, ja, ko, es, de, fr, hi, pt, ru) with existing high-quality translations preserved where the underlying English text is unchanged. - AGENTS.md rewritten — Android-only scope, points contributors at speech-core for any C++ / model / Linux changes. - .gitignore drops the linux/tests/models/ and /ort-linux/ entries that are no longer relevant. - setup.sh trimmed to just the Android ORT download + submodule init (it was previously rewriting the .gitignore on every invocation). Verified locally: - ./gradlew :sdk:externalNativeBuildDebug — BUILD SUCCESSFUL, 5.6 MB libspeech_android.so produced for arm64-v8a, links libonnxruntime.so and libc++_shared.so cleanly. - ./gradlew :sdk:assembleDebug :sdk:test — BUILD SUCCESSFUL, 77 tasks. Next: connectedAndroidTest needs to run on an emulator (downloads 1.2 GB of models on first run); will run that in CI rather than locally.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A new
SpeechRecognitionService(inaudio.soniqo.speech.service) wrappingSpeechPipelineso any app using theSpeechRecognizerAPI (Gboard, Duolingo, the system voice-input picker) can invoke fully on-device STT.This PR also absorbs what was originally split into PR #21 (interface refactor + Robolectric tests). Replacement for #21 — close it once this lands.
Service contract
AudioRecord(VOICE_RECOGNITION, 16 kHz, PCM_FLOAT) — callers do not push audio.SpeechStarted → beginningOfSpeech,PartialTranscription → partialResults,TranscriptionCompleted → results(session ends),SpeechEnded → endOfSpeech,Error → error(ERROR_SERVER).EXTRA_PARTIAL_RESULTSby wiringemitPartialTranscriptionson the pipeline.EXTRA_LANGUAGEis logged but not enforced — Parakeet TDT v3 auto-detects.Service polish (added during review)
onCheckRecognitionSupport(API 33+). Returns aRecognitionSupportwith ourSUPPORTED_LANGUAGES(27 BCP-47 tags from Parakeet TDT v3) marked installed-on-device whenModelManager.areModelsReady()is true, pending otherwise. Lets callers surface a "downloading models" UX instead of silently falling back to an online recognizer.AUDIOFOCUS_GAIN_TRANSIENTwithUSAGE_VOICE_COMMUNICATIONwhen a session starts, abandons on tear down. OnAUDIOFOCUS_LOSS/LOSS_TRANSIENTthe listener tears down the session — yields the mic to incoming calls and nav prompts. Best-effort: a denied focus request logs and proceeds.android:settingsActivityinrecognition_service.xmland aRECOGNIZER_INTENTfilter in the demo manifest.ModelManager.areModelsReady()public API. Synchronous, side-effect-free check used byonCheckRecognitionSupportand the settings activity.Bug fixes pinned by tests
onStopListeningcut the mic without flushing → VAD never saw silence →TranscriptionCompletednever fired. Now pushes ~1 s of zero frames after cancelling the mic job.AtomicBooleansynchronously.Refactor + Robolectric coverage
SpeechPipelinebecomes an interface withSpeechPipelineImpl; companioninvokekeepsSpeechPipeline(config)working at every existing call site (demo +androidTest).createPipeline,resolveModelDir,newAudioRecord.startListening_setsUpPipelineAndSignalsReadyreadyForSpeechfires after pipeline initstartListening_concurrentCallReturnsBusystopListening_flushesPipelineWithSilencestartListening_withoutPermission_reportsInsufficienttranscriptionCompleted_emitsResultsAndTearsDownSessionresults(...)and closes the pipelinestartListening_requestsAudioFocusaudioFocusLoss_tearsDownSessionAUDIOFOCUS_LOSScallback closes the pipelineonCheckRecognitionSupport_modelsNotReady_marksLanguagesPendingTests use a
TestableServicesubclass that overrides the seams, aFakeSpeechPipelineimplementing the new interface, and a MockK-mockedAudioRecord.Scope
Out of scope (deferred):
SpeechConfig/parakeet_stt.cpp. Tracked separately.BroadcastReceiverforRecognizerIntent.ACTION_GET_LANGUAGE_DETAILS(pre-API-33 language discovery). minSdk is 26 but the modernonCheckRecognitionSupportpath covers the dominant case; can add the receiver later if real-world demand surfaces.Closes #4.
Test plan
./gradlew :sdk:assembleDebug :app:assembleDebug— green./gradlew :sdk:testDebugUnitTest— 23/23 pass (8 service + 15 ModelManager)./gradlew :sdk:connectedDebugAndroidTest— 34/34 pass on arm64 emulator (verifies theSpeechPipelineinterface refactor doesn't break the existing pipeline tests)dumpsys package audio.soniqo.speech.demo):android.speech.RecognitionServicefilter onSpeechRecognitionServicewithRECORD_AUDIOpermissionandroid.speech.action.RECOGNIZER_INTENTfilter onSpeechRecognitionSettingsActivitysettings put secure voice_recognition_service audio.soniqo.speech.demo/audio.soniqo.speech.service.SpeechRecognitionService— readback confirmedam start -a android.speech.action.RECOGNIZER_INTENT— renders title, model-readiness state, and the 27-language list (screenshot in PR thread)Notes