|
| 1 | +# Matrix Voice Transcript |
| 2 | + |
| 3 | +Matrix bot that transcribes voice messages and audio files using [NVIDIA NeMo Parakeet TDT](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) running locally on CPU. Supports E2EE rooms. No audio leaves the server. |
| 4 | + |
| 5 | +## Requirements |
| 6 | + |
| 7 | +- Matrix bot account with an access token. |
| 8 | +- `MATRIX_PASSWORD` recommended for E2EE rooms (enables stale-device pruning; without it decryption may fail on first run). |
| 9 | +- ~2.5 GB disk space for the model checkpoint (cached in `./models`, downloaded on first start). |
| 10 | + |
| 11 | +## Quick start |
| 12 | + |
| 13 | +1. Copy `.env.example` to `.env` and fill in the variables. |
| 14 | +2. `docker compose up -d` |
| 15 | +3. Invite the bot to a Matrix room. |
| 16 | + |
| 17 | +## Environment variables |
| 18 | + |
| 19 | +| Variable | Description | |
| 20 | +|---|---| |
| 21 | +| `MATRIX_HS_URL` | Homeserver URL (with `https://`) | |
| 22 | +| `MATRIX_USER_ID` | Full bot MXID, e.g. `@voicebot:example.org` | |
| 23 | +| `MATRIX_ACCESS_TOKEN` | Bot access token | |
| 24 | +| `MATRIX_PASSWORD` | Optional. Prunes stale E2EE devices on startup; required for reliable decryption in encrypted rooms. | |
| 25 | +| `LOCALE` | Message language: `en` (default) or `ru` | |
| 26 | +| `ASR_MODEL_NAME` | NeMo model (default: `nvidia/parakeet-tdt-0.6b-v2`) | |
| 27 | +| `MAX_AUDIO_BYTES` | Max file size in bytes (default: `26214400` = 25 MB) | |
| 28 | +| `STORE_PATH` | Olm key store path inside the container (default: `/data/store`) | |
| 29 | + |
| 30 | +**Supported formats:** ogg/opus, webm, mp4/m4a, aac, flac, mp3, wav. |
| 31 | + |
| 32 | +## Message language / Смена языка |
| 33 | + |
| 34 | +```env |
| 35 | +LOCALE=en # English (default) |
| 36 | +LOCALE=ru # Russian / Русский |
| 37 | +``` |
| 38 | + |
| 39 | +`docker compose restart` to apply. |
| 40 | + |
| 41 | +## Local development |
| 42 | + |
| 43 | +Requires Python 3.11+ and `ffmpeg` on PATH. |
| 44 | + |
| 45 | +```bash |
| 46 | +python -m venv .venv && .venv\Scripts\activate # Windows |
| 47 | +pip install -r requirements.txt |
| 48 | +python -m src.main |
| 49 | +``` |
| 50 | + |
| 51 | +## Security |
| 52 | + |
| 53 | +- Never commit `.env`. |
| 54 | +- Transcribed text is never written to logs. |
| 55 | +- Temp audio files are deleted immediately after transcription. |
| 56 | + |
| 57 | +## License |
| 58 | + |
| 59 | +MIT |
0 commit comments