Skip to content

OmeshThokchom/N7speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

N7Speech

SOTA Manipuri ASR

N7Speech is the State-of-the-Art (SOTA) Automatic Speech Recognition (ASR) model for Manipuri (Meiteilon).
It delivers highly accurate, real-time and file-based speech-to-text for Manipuri, supporting both Meitei Mayek and Latin phoneme outputs.

N7Speech is a Python library for real-time and file-based speech recognition and Meitei Mayek phoneme conversion.
It supports both microphone and audio file (wav/mp3) input, and can output either Meitei Mayek or Latin phoneme representations.


🚀 Why N7Speech?

  • State-of-the-Art (SOTA) performance for Manipuri (Meiteilon) ASR
  • Fast, accurate, and robust for both real-time and file-based transcription
  • Supports both Meitei Mayek and Latin phoneme outputs
  • Easy to use, cross-platform, and GPU-accelerated

Author

Dayananda Thokchom

Features

  • Real-time speech recognition from microphone with VAD (voice activity detection)
  • Transcription from audio files (wav/mp3)
  • Meitei Mayek to phoneme (Latin) conversion
  • Simple, high-level API
  • ONNX model backend for fast inference

Installation

Linux/macOS

pip install n7speech

Or for local development:

git clone https://github.com/yourusername/N7speech.git
cd N7speech
pip install .

Windows

  1. Install Python 3.7+ from python.org.
  2. Open Command Prompt as Administrator.
  3. Install the package:
pip install n7speech
  1. If you encounter issues with sounddevice, install the appropriate wheel from PyPI or use:
pip install pipwin
pipwin install sounddevice

GPU Acceleration (All Platforms)

Users can install either onnxruntime (CPU) or onnxruntime-gpu (GPU) as needed.
Here, we specify onnxruntime as the default, but recommend for NVIDIA-GPU users to uninstall with
pip uninstall onnxruntime
and install
pip install onnxruntime-gpu
for much faster inference.

Usage

Real-time microphone transcription

from n7speech import RealTimeSpeech

RealTimeSpeech(lang="mni-latin").start(lambda t: print(f"\nResult: {t}"))

Transcribe from audio file

from n7speech import speech_from_file

result = speech_from_file("your_audio.wav", lang="mni-latin")
print(result)
  • lang="mni" for Meitei Mayek output, lang="mni-latin" for phoneme output.

Platform Support

N7Speech is cross-platform and works on Linux, macOS, and Windows.
All dependencies (onnxruntime, torch, numpy, librosa, sounddevice) are available for these operating systems.

  • For macOS and Windows users, make sure your Python environment and audio drivers are set up correctly for sounddevice and torch.
  • For GPU acceleration, ensure you install the correct version of onnxruntime-gpu and have compatible CUDA drivers (on supported hardware).

Requirements

  • Python 3.7+
  • onnxruntime or onnxruntime-gpu (for GPU acceleration, highly recommended for fast transcription; e.g., 20s wav in ~110ms)
  • numpy
  • librosa
  • torch
  • sounddevice

Model and Vocab

Place your ONNX model as model.onnx and vocabulary as vocab.txt in the working directory.

License

MIT License

About

Manipuri ASR – A state-of-the-art, low-latency speech-to-text library with advanced voice activity detection and real-time transcription, purpose-built for the Manipuri language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages