Skip to content

Commit 0f80a4c

Browse files
otosaku-aiclaude
andcommitted
Initial release: NeMo Feature Extractor for Android
Kotlin library for extracting mel spectrogram features compatible with NVIDIA NeMo models. Direct port from NeMoFeatureExtractor-iOS with identical processing logic: - FFT via Cooley-Tukey algorithm - Pre-computed NeMo filterbank (mel_filterbank.bin) - VAD, ASR, and Speaker model configurations - Pre-emphasis, Hann windowing, log transform - Per-feature normalization with Bessel's correction - Frame padding (padTo parameter) All tests pass with strict tolerance (max diff < 1e-4). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
0 parents  commit 0f80a4c

20 files changed

Lines changed: 1714 additions & 0 deletions

.gitignore

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Gradle
2+
.gradle/
3+
build/
4+
!gradle/wrapper/gradle-wrapper.jar
5+
6+
# Android Studio
7+
.idea/
8+
*.iml
9+
local.properties
10+
11+
# Kotlin
12+
*.class
13+
*.jar
14+
*.war
15+
*.nar
16+
*.ear
17+
*.zip
18+
*.tar.gz
19+
*.rar
20+
21+
# macOS
22+
.DS_Store
23+
24+
# Logs
25+
*.log

README.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# NeMoFeatureExtractor-Android
2+
3+
Kotlin library for extracting mel spectrograms compatible with NVIDIA NeMo models on Android.
4+
5+
## Features
6+
7+
- NeMo-compatible mel spectrogram extraction
8+
- Support for VAD (MarbleNet), ASR (Conformer, Parakeet), and Speaker (TitaNet) models
9+
- Pre-computed NeMo filterbank for maximum accuracy
10+
- Pure Kotlin implementation with no external dependencies
11+
- Configurable normalization modes
12+
13+
## Requirements
14+
15+
- Android API 24+
16+
- Kotlin 1.9+
17+
18+
## Installation
19+
20+
### Gradle
21+
22+
Add JitPack repository to your project's `settings.gradle.kts`:
23+
24+
```kotlin
25+
dependencyResolutionManagement {
26+
repositories {
27+
maven { url = uri("https://jitpack.io") }
28+
}
29+
}
30+
```
31+
32+
Add the dependency to your module's `build.gradle.kts`:
33+
34+
```kotlin
35+
dependencies {
36+
implementation("com.github.Otosaku:NeMoFeatureExtractor-Android:1.0.0")
37+
}
38+
```
39+
40+
## Usage
41+
42+
### Basic Usage
43+
44+
```kotlin
45+
import com.otosaku.nemofeatureextractor.NeMoFeatureExtractor
46+
import com.otosaku.nemofeatureextractor.MelSpectrogramConfig
47+
48+
// For VAD (MarbleNet)
49+
val vadExtractor = NeMoFeatureExtractor(context, MelSpectrogramConfig.nemoVAD)
50+
val features = vadExtractor.process(audioSamples)
51+
52+
// For ASR (Conformer, Parakeet)
53+
val asrExtractor = NeMoFeatureExtractor(context, MelSpectrogramConfig.nemoASR)
54+
val features = asrExtractor.process(audioSamples)
55+
56+
// For Speaker (TitaNet)
57+
val speakerExtractor = NeMoFeatureExtractor(context, MelSpectrogramConfig.nemoSpeaker)
58+
val features = speakerExtractor.process(audioSamples)
59+
```
60+
61+
### Without Context (generates filterbank)
62+
63+
```kotlin
64+
val extractor = NeMoFeatureExtractor(MelSpectrogramConfig.nemoVAD)
65+
val features = extractor.process(audioSamples)
66+
```
67+
68+
### Custom Configuration
69+
70+
```kotlin
71+
val config = MelSpectrogramConfig(
72+
sampleRate = 16000,
73+
nMels = 80,
74+
nFFT = 512,
75+
windowSize = 400,
76+
hopLength = 160,
77+
normalization = NormalizationMode.PER_FEATURE,
78+
preemph = 0.97f
79+
)
80+
81+
val extractor = NeMoFeatureExtractor(context, config)
82+
```
83+
84+
## Audio Requirements
85+
86+
- Sample rate: 16,000 Hz
87+
- Channels: Mono
88+
- Format: Float32 array
89+
90+
## Configuration Presets
91+
92+
| Preset | Normalization | Pad To | Use Case |
93+
|--------|---------------|--------|----------|
94+
| `nemoVAD` | None | 2 | Voice Activity Detection (MarbleNet) |
95+
| `nemoASR` | Per-feature | 0 | Speech Recognition (Conformer, Parakeet) |
96+
| `nemoSpeaker` | Per-feature | 16 | Speaker Verification (TitaNet) |
97+
98+
## Output Format
99+
100+
The `process()` method returns `Array<FloatArray>` with shape `[nMels, nFrames]`:
101+
- `nMels`: Number of mel frequency bins (default: 80)
102+
- `nFrames`: Number of time frames (depends on audio length)
103+
104+
## License
105+
106+
MIT License
107+
108+
## Related Projects
109+
110+
- [NeMoFeatureExtractor-iOS](https://github.com/Otosaku/NeMoFeatureExtractor-iOS) - iOS/macOS version
111+
- [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) - Original implementation

build.gradle.kts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
plugins {
2+
id("com.android.library") version "8.2.0" apply false
3+
id("org.jetbrains.kotlin.android") version "1.9.22" apply false
4+
}

gradle.properties

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8
2+
android.useAndroidX=true
3+
kotlin.code.style=official
4+
android.nonTransitiveRClass=true
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
distributionBase=GRADLE_USER_HOME
2+
distributionPath=wrapper/dists
3+
distributionUrl=https\://services.gradle.org/distributions/gradle-8.4-bin.zip
4+
networkTimeout=10000
5+
validateDistributionUrl=true
6+
zipStoreBase=GRADLE_USER_HOME
7+
zipStorePath=wrapper/dists

0 commit comments

Comments
 (0)