Advanced AI Voice — digital signal processing for audio: filtering, feature extraction, and quality restoration.
Topics: audio-ml · conversational-ai · deep-learning · generative-ai · natural-language-processing · neural-networks · neural-tts · speech-synthesis · text-to-speech · voice-ai
Advanced AI Voice applies digital signal processing techniques to audio signals, covering filter design, noise reduction, and spectral analysis. It provides both a programmatic API and an interactive interface for exploring audio processing parameters and observing their effects in real time.
The processing pipeline is built on SciPy's signal processing module and LibROSA, combining classical IIR/FIR filter theory with modern audio feature extraction. The project serves both as a practical audio processing tool and as an educational resource for understanding DSP concepts.
All processing steps are visualised with before/after waveform and spectrogram comparisons, and quantitative metrics (SNR, PESQ where applicable) provide objective quality measurement.
Audio quality issues — background noise, hum, clipping, reverb — affect recordings across speech, music, and scientific data. This project addresses these challenges with well-understood DSP techniques, providing transparent, configurable processing with measurable results.
Audio Input (WAV/MP3/FLAC)
│
Signal Analysis (STFT, spectrogram)
│
Filter/Enhancement Pipeline
│
Quality Measurement (SNR, spectral flatness)
│
Output Audio + Visualisation
IIR and FIR filter design with configurable type, order, and cutoff frequency.
STFT, mel spectrogram, and MFCC feature extraction with visualisation.
Multiple denoising methods: filtering, spectral subtraction, Wiener filtering.
SNR before/after comparison, spectral flatness measurement.
Before/after waveform and spectrogram comparison plots.
Process multiple audio files with the same configuration.
In-app playback of original and processed audio for subjective evaluation.
Save processed audio in WAV, MP3, or FLAC format.
| Library / Tool | Role | Why This Choice |
|---|---|---|
| SciPy | Filter design and DSP | signal.iirdesign, sosfilt, STFT |
| LibROSA | Audio features | Mel spectrogram, MFCC, load/save |
| NumPy | Array ops | FFT, framing, overlap-add |
| Matplotlib | Visualisation | Waveform, spectrogram, Bode plots |
| Soundfile | Audio I/O | Read/write WAV/FLAC |
Key packages detected in this repo:
streamlit·gtts·pydub
- Python 3.9+ (or Node.js 18+ for TypeScript/JS projects)
pipornpmpackage manager- Relevant API keys (see Configuration section)
git clone https://github.com/Devanik21/Advanced-AI-voice.git
cd Advanced-AI-voice
pip install scipy librosa numpy matplotlib soundfile
python process.py --input audio.wavpython process.py --input audio.wav --method hpf --cutoff 100| Variable | Default | Description |
|---|---|---|
--method |
hpf |
Processing method |
--cutoff |
100 |
Filter cutoff frequency (Hz) |
--sr |
16000 |
Target sample rate |
Copy
.env.exampleto.envand populate all required values before running.
Advanced-AI-voice/
├── README.md
├── requirements.txt
├── app.py
└── ...
- Deep learning denoising model integration
- Real-time stream processing
- PESQ and STOI evaluation metrics
- Multi-channel audio support
- REST API for audio processing as a service
Contributions, issues, and feature requests are welcome. Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'feat: add your feature') - Push to your branch (
git push origin feature/your-feature) - Open a Pull Request
Please follow conventional commit messages and ensure any new code is documented.
Sample rate defaults to 16kHz. Adjust for your audio files.
Devanik Debnath
B.Tech, Electronics & Communication Engineering
National Institute of Technology Agartala
This project is open source and available under the MIT License.
Crafted with curiosity, precision, and a belief that good software is worth building well.