This repository is a collection of resources and models for multilingual speech topics.
- corpus: list of available corpus
- model: API or ready-to-use model
- recipe: espnet recipes
- tools: other relevent tools such as g2p
Each directory is organized by languages where each language is specified by the ISO639-3 Id.
- If you find any relevant speech resources (e.g: corpus, model, recipe), you can edit the corresponding file under
data/lang/<your lang> - If there are no existing file, you can create a file following the style in the English directory
- Once your pull request is merged, it will be automatically integrated into our website
Our web interface is based on the mkdocs framework and its theme mkdocs-material
You need to first install those software
pip install mkdocs-materialThen build the docs and serve it
python mkbuild/build_docs.py
mkdocs serveThe most common language id are as follows:
| ISO id | Language |
|---|---|
| aar | Afar |
| amh | Amharic |
| ara | Literary Arabic |
| aze | Azerbaijani |
| ben | Bengali |
| cat | Catalan |
| ceb | Cebuano |
| cmn | Mandarin |
| ckb | Sorani |
| deu | German |
| eng | English‡ |
| fas | Farsi |
| fra | French |
| hau | Hausa |
| hin | Hindi |
| hun | Hungarian |
| ilo | Ilocano |
| ind | Indonesian |
| ita | Italian |
| jav | Javanese |
| kaz | Kazakh |
| kin | Kinyarwanda |
| kir | Kyrgyz |
| kmr | Kurmanji |
| lao | Lao |
| mal | Malayalam |
| mar | Marathi |
| mlt | Maltese |
| mya | Burmese |
| msa | Malay |
| nld | Dutch |
| nya | Chichewa |
| orm | Oromo |
| pan | Punjabi |
| pol | Polish |
| por | Portuguese |
| ron | Romanian |
| rus | Russian |
| sna | Shona |
| som | Somali |
| spa | Spanish |
| swa | Swahili |
| swe | Swedish |
| tam | Tamil |
| tel | Telugu |
| tgk | Tajik |
| tgl | Tagalog |
| tha | Thai |
| tir | Tigrinya |
| tpi | Tok Pisin |
| tuk | Turkmen |
| tur | Turkish |
| ukr | Ukranian |
| uig | Uyghur |
| uzb | Uzbek |
| vie | Vietnamese |
| xho | Xhosa |
| yor | Yoruba |
| zul | Zulu |