🎧 🌲 CS 753: Advanced Neural Techniques for Audio Analysis and Processing under the guidance of Professor Preethi Jyothi

Contributors 🧑‍🎓🌟

Meet the brilliant minds behind this project:

Anuj Attri (23M0808) 👨‍🎓
Arnav Attri (23M0811) 👨‍🎓
Pratham Tarjule (20D110029) 👨‍🎓

Introduction 🌟

This repository contains three advanced Jupyter notebooks that demonstrate innovative uses of neural networks for audio analysis, transcription, and processing.

Notebooks Overview 📚

1. Whisper Models with LibriSpeech Dataset 🗣️

This notebook demonstrates the application of OpenAI's Whisper models (Tiny, Base, Small) for transcribing audio data from the LibriSpeech dataset. Utilizing advanced audio preprocessing techniques and machine learning models, we achieve progressively lower Word Error Rates (WER).

Technologies used: torch, whisper, pandas, torchaudio
Key features:
- Audio data preprocessing 🎚️
- Utilization of different Whisper model sizes 📏
- WER calculation and comparison 📉

2. Playing Audio Files and Generating Transcripts with Wav2Vec2 🔊

Dive into the process of playing audio files and converting them into text using the Wav2Vec2 model. This notebook provides a hands-on approach to handling audio data, processing it with sophisticated models, and generating accurate transcripts.

Technologies used: torch, IPython, librosa, transformers
Highlights:
- Audio playback in Jupyter notebooks 🎧
- Detailed transcription process with model insights 📝
- Analysis of transcription accuracy and model warnings ⚠️

3. Temporal U-Net with Squeezeformer Blocks ⏳

Explore a sophisticated architecture that combines Temporal U-Net with Squeezeformer blocks for enhanced sequence processing. This notebook introduces a powerful model for handling complex sequential data, demonstrating the integration of convolution and attention mechanisms.

Technologies used: torch
Special features:
- Encoder-decoder architecture with Squeezeformer blocks 🔧
- Depthwise separable convolution subsampling 🌀
- Application in sequential data processing and analysis 🔍

What was your assigned hacker paper and how does what you've implemented relate to it?

Paper Assigned: Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Our Squeezeformer.ipynb implements several components related to the Temporal U-Net architecture, which is a deep learning model designed for sequence-to-sequence tasks, such as audio source separation or speech enhancement.

Squeezeformer Block
Squeezeformer:
Depthwise Separable Convolution Subsampling:
Unified Activations with Squeezeformer Block:

What did you newly implement, where was the code mainly derived from?

We have tried with Open source Whisper Model to calculate WER on test-clean dataset from librespeech as there was no training script available for Squeezeformer.

Our code was mainly derived from multiple sources including Github, but is majorly written by us.

Demo is shown in Wav2Vec.ipynb

Getting Started 🚀

Ready to Dive In? These notebooks are fully implemented and ready to run, offering a seamless experience right out of the box. Simply download, open, and execute to explore the cutting-edge techniques in audio processing:

git clone https://github.com/arnavcse/CS753-Hacker.git
cd CS753-Hacker
jupyter lab
With everything set up for you, getting started is as easy as pie! 🥧 Enjoy experimenting with advanced neural network techniques for audio analysis and processing.

License 📄
This project is licensed under the MIT License - see the LICENSE.md file for details.

Show Your Support 💖
Give a ⭐️ if this project helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
2.gif		2.gif
ASR Final.ipynb		ASR Final.ipynb
README.md		README.md
Squeezeformer.ipynb		Squeezeformer.ipynb
Squeezeformer.pdf		Squeezeformer.pdf
Wav2Vec.ipynb		Wav2Vec.ipynb
sample_audio.wav		sample_audio.wav
sf_pretrained.ipynb		sf_pretrained.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 🌲 CS 753: Advanced Neural Techniques for Audio Analysis and Processing under the guidance of Professor Preethi Jyothi

Contributors 🧑‍🎓🌟

Introduction 🌟

Notebooks Overview 📚

1. Whisper Models with LibriSpeech Dataset 🗣️

2. Playing Audio Files and Generating Transcripts with Wav2Vec2 🔊

3. Temporal U-Net with Squeezeformer Blocks ⏳

Getting Started 🚀

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 🌲 CS 753: Advanced Neural Techniques for Audio Analysis and Processing under the guidance of Professor Preethi Jyothi

Contributors 🧑‍🎓🌟

Introduction 🌟

Notebooks Overview 📚

1. Whisper Models with LibriSpeech Dataset 🗣️

2. Playing Audio Files and Generating Transcripts with Wav2Vec2 🔊

3. Temporal U-Net with Squeezeformer Blocks ⏳

Getting Started 🚀

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages