Detect 40 Ghanaian languages from text using bigram pattern matching.
No internet, no model, no setup — just Python 3.
| Anufo | Anyin | Avatime | Bimoba |
| Bisa | Buli | Chumburung | Dagbani |
| Dangme | Delo | Ewe | Farefare |
| Gikyode | Gonja | Kasem | Konkomba |
| Konni | Kusaal | Lelemi | Mampruli |
| Nawuri | Nkonya | Ntcham | Nzema |
| Paasaal | Sekpele | Selee | Siwu |
| Southern Birifor | Southern Dagaare | Tampulma | Tumulung Sisaala |
| Tuwuli | Twi | Vagla |
pip install .That's it. No extra dependencies.
tiny-detect check dagbani "O di yɛra a saa"✅ Text MATCHES DAGBANI
Sentence-pass rate : 95.0%
Sentences analysed : 1
tiny-detect check twi "O di yɛra a saa"❌ Text does NOT match TWI
Sentence-pass rate : 8.0%
Sentences analysed : 1
tiny-detect detect "O di yɛra a saa"🔍 Detected language : DAGBANI
Sentences analysed : 1
dagbani 95.0% ███████████████████
ewe 8.0% █
twi 6.0% █
...
tiny-detect listcat mytext.txt | tiny-detect detect -
tiny-detect detect mytext.txttiny-detect --json detect "some text"
tiny-detect --json check dagbani "some text"from src.detector import LanguageDetector
detector = LanguageDetector()
# Auto-detect
result = detector.detect("O di yɛra a saa")
print(result["language"]) # "dagbani"
# Check one language
result = detector.check_language("O di yɛra a saa", "dagbani")
print(result["match"]) # True
print(result["score"]) # 0.95Each language has a bigram table that defines which two-letter combinations are valid at the start, middle, and end of words. A text is matched to a language when enough of its words and sentences fit those patterns.
The detection thresholds (all adjustable):
- A word matches if ≥ 80% of its bigrams are valid
- A sentence passes if ≥ 80% of its words match
- A text is identified as a language if ≥ 70% of its sentences pass
# Example: loosen the thresholds for noisy or mixed text
tiny-detect --text-threshold 0.60 --sentence-threshold 0.70 detect "some text"MIT