Code to train a custom time-domain autoencoder to dereverb audio
-
Updated
Nov 30, 2023 - Python
Code to train a custom time-domain autoencoder to dereverb audio
ML-based speech emotion recognition system that analyzes audio features to classify emotions with a simple interface for testing.
Key Features: Simple VAE architecture with encoder/decoder Synthetic music data generation for training Interactive training with progress tracking Music generation from latent space sampling Audio conversion and playback Downloadable audio files
Audio analysis in javascript/typescript
Audio file processing pipeline with GPT-4-powered error diagnosis — detects codec issues, sample rate mismatches, and corruption artefacts with automated remediation suggestions.
AI-generated audio summarisation pipeline — Whisper transcription, LLM key-insight extraction, and structured spoken summaries with TTS playback and Streamlit interface.
Music harmony AI — chord progression analysis with Roman numeral labelling, voice leading checker, style-conditioned progression generation (Baroque/Jazz/Pop), and MIDI export via music21.
Neural TTS and voice-cloning application using XTTS/VITS. Supports 3–30 s reference audio for speaker adaptation, real-time pitch/speed control, and WAV/MP3 export.
Add a description, image, and links to the audio-ml topic page so that developers can more easily learn about it.
To associate your repository with the audio-ml topic, visit your repo's landing page and select "manage topics."