Tiny Language Detector 🔍

Detect 40 Ghanaian languages from text using bigram pattern matching.
No internet, no model, no setup — just Python 3.

Supported languages


Anufo	Anyin	Avatime	Bimoba
Bisa	Buli	Chumburung	Dagbani
Dangme	Delo	Ewe	Farefare
Gikyode	Gonja	Kasem	Konkomba
Konni	Kusaal	Lelemi	Mampruli
Nawuri	Nkonya	Ntcham	Nzema
Paasaal	Sekpele	Selee	Siwu
Southern Birifor	Southern Dagaare	Tampulma	Tumulung Sisaala
Tuwuli	Twi	Vagla

Install

pip install .

That's it. No extra dependencies.

Usage

Check if a text matches a specific language

tiny-detect check dagbani "O di yɛra a saa"

✅  Text MATCHES DAGBANI
   Sentence-pass rate : 95.0%
   Sentences analysed : 1

tiny-detect check twi "O di yɛra a saa"

❌  Text does NOT match TWI
   Sentence-pass rate : 8.0%
   Sentences analysed : 1

Auto-detect the language

tiny-detect detect "O di yɛra a saa"

🔍 Detected language : DAGBANI
   Sentences analysed : 1

   dagbani              95.0%  ███████████████████
   ewe                   8.0%  █
   twi                   6.0%  █
   ...

See all supported languages

tiny-detect list

Read from a file

cat mytext.txt | tiny-detect detect -
tiny-detect detect mytext.txt

Get JSON output (useful for scripting)

tiny-detect --json detect "some text"
tiny-detect --json check dagbani "some text"

Python API

from src.detector import LanguageDetector

detector = LanguageDetector()

# Auto-detect
result = detector.detect("O di yɛra a saa")
print(result["language"])  # "dagbani"

# Check one language
result = detector.check_language("O di yɛra a saa", "dagbani")
print(result["match"])     # True
print(result["score"])     # 0.95

How it works

Each language has a bigram table that defines which two-letter combinations are valid at the start, middle, and end of words. A text is matched to a language when enough of its words and sentences fit those patterns.

The detection thresholds (all adjustable):

A word matches if ≥ 80% of its bigrams are valid
A sentence passes if ≥ 80% of its words match
A text is identified as a language if ≥ 70% of its sentences pass

# Example: loosen the thresholds for noisy or mixed text
tiny-detect --text-threshold 0.60 --sentence-threshold 0.70 detect "some text"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
data		data
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny Language Detector 🔍

Supported languages

Install

Usage

Check if a text matches a specific language

Auto-detect the language

See all supported languages

Read from a file

Get JSON output (useful for scripting)

Python API

How it works

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiny Language Detector 🔍

Supported languages

Install

Usage

Check if a text matches a specific language

Auto-detect the language

See all supported languages

Read from a file

Get JSON output (useful for scripting)

Python API

How it works

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages