🇲🇦 Speech-to-Text and Text-to-Speech for Moroccan Arabic (Darija)
- Speech-to-Text (STT): Transcribe Darija audio using SpeechBrain's wav2vec2 model
- Text-to-Speech (TTS): Generate Darija speech using Coqui XTTS-v2 with voice cloning
- Python 3.10+
- CUDA-capable GPU recommended (CPU supported but slower)
- ~8GB+ disk space for models
The project uses a dedicated Python 3.11 environment managed by uv.
# Install uv (if not already installed)
pip install uv
# Create environment and install dependencies
uv venv --python 3.11
uv pip install -r requirements.txt.\.venv\Scripts\python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000.\.venv\Scripts\python -m streamlit run streamlit_app.py- Streamlit UI: http://localhost:8501
- API Docs: http://localhost:8000/docs
Upload audio file → Get Darija transcription
curl -X POST -F "audio=@audio.wav" http://localhost:8000/api/sttSend text → Get synthesized Darija audio
curl -X POST -F "text=السلام عليكم" http://localhost:8000/api/tts --output speech.wav| Component | Model | Source |
|---|---|---|
| STT | wav2vec2-dvoice-darija | SpeechBrain |
| TTS | darija_xtt_2.0 | medmac01 |
For TTS voice cloning, add a 3-10 second WAV file at:
reference_audio/darija_speaker.wav
Open source for educational purposes.