Skip to content

arnavcse/CS753-Hacker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎧 🌲 CS 753: Advanced Neural Techniques for Audio Analysis and Processing under the guidance of Professor Preethi Jyothi

Advanced Neural Techniques

Python License GitHub commit activity

Contributors πŸ§‘β€πŸŽ“πŸŒŸ

Meet the brilliant minds behind this project:

  • Anuj Attri (23M0808) πŸ‘¨β€πŸŽ“
  • Arnav Attri (23M0811) πŸ‘¨β€πŸŽ“
  • Pratham Tarjule (20D110029) πŸ‘¨β€πŸŽ“

Introduction 🌟

This repository contains three advanced Jupyter notebooks that demonstrate innovative uses of neural networks for audio analysis, transcription, and processing.


Notebooks Overview πŸ“š

1. Whisper Models with LibriSpeech Dataset πŸ—£οΈ

This notebook demonstrates the application of OpenAI's Whisper models (Tiny, Base, Small) for transcribing audio data from the LibriSpeech dataset. Utilizing advanced audio preprocessing techniques and machine learning models, we achieve progressively lower Word Error Rates (WER).

  • Technologies used: torch, whisper, pandas, torchaudio
  • Key features:
    • Audio data preprocessing 🎚️
    • Utilization of different Whisper model sizes πŸ“
    • WER calculation and comparison πŸ“‰

2. Playing Audio Files and Generating Transcripts with Wav2Vec2 πŸ”Š

Dive into the process of playing audio files and converting them into text using the Wav2Vec2 model. This notebook provides a hands-on approach to handling audio data, processing it with sophisticated models, and generating accurate transcripts.

  • Technologies used: torch, IPython, librosa, transformers
  • Highlights:
    • Audio playback in Jupyter notebooks 🎧
    • Detailed transcription process with model insights πŸ“
    • Analysis of transcription accuracy and model warnings ⚠️

3. Temporal U-Net with Squeezeformer Blocks ⏳

Explore a sophisticated architecture that combines Temporal U-Net with Squeezeformer blocks for enhanced sequence processing. This notebook introduces a powerful model for handling complex sequential data, demonstrating the integration of convolution and attention mechanisms.

  • Technologies used: torch
  • Special features:
    • Encoder-decoder architecture with Squeezeformer blocks πŸ”§
    • Depthwise separable convolution subsampling πŸŒ€
    • Application in sequential data processing and analysis πŸ”

What was your assigned hacker paper and how does what you've implemented relate to it?

Paper Assigned: Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Our Squeezeformer.ipynb implements several components related to the Temporal U-Net architecture, which is a deep learning model designed for sequence-to-sequence tasks, such as audio source separation or speech enhancement.

  1. Squeezeformer Block

  2. Squeezeformer:

  3. Depthwise Separable Convolution Subsampling:

  4. Unified Activations with Squeezeformer Block:

What did you newly implement, where was the code mainly derived from?

We have tried with Open source Whisper Model to calculate WER on test-clean dataset from librespeech as there was no training script available for Squeezeformer.

Our code was mainly derived from multiple sources including Github, but is majorly written by us.

Demo is shown in Wav2Vec.ipynb


Getting Started πŸš€

Ready to Dive In? These notebooks are fully implemented and ready to run, offering a seamless experience right out of the box. Simply download, open, and execute to explore the cutting-edge techniques in audio processing:

git clone https://github.com/arnavcse/CS753-Hacker.git
cd CS753-Hacker
jupyter lab
With everything set up for you, getting started is as easy as pie! πŸ₯§ Enjoy experimenting with advanced neural network techniques for audio analysis and processing.

License πŸ“„
This project is licensed under the MIT License - see the LICENSE.md file for details.

Show Your Support πŸ’–
Give a ⭐️ if this project helped you!

About

🌲 CS 753: Hacker Implementation of Squeezeformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors