Skip to content

Rmsaah/song-genre-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Song Genre Classification

Overview

The main goal of this project is to leverage NLP and Machine Learning to classify song lyrics (English) by genre, without needing any audio data. The dataset was collected from Genius, and it has around five million records.

How it Works

1. Data Collection & Cleaning

  • Normalized some unnecessarily repeated characters.
  • Removed stopwords, symbols and punctuation.

2. Feature Extraction

  • Lemmatize & Tokenzie words.
  • Used Word2Vec for feature extraction, with TF-IDF adding weight to words.

3. Model Training

  • Experimented with different algorithms.
  • Picked out the best preforming ones (LightGBM, Random Forest, XGBoost).

4. Evaluation

  • Compared using accuracy, precision, recall and F1-score.
  • LightGBM --> Accuracy: 68%
  • Random Forest --> Accuracy: 66%
  • XGBoost --> Accuracy: 68%

5. Deployment

  • Integrated the trained models into a Streamlit web application for easy interaction.

Setup Instructions

1. Clone the Repository

git clone https://github.com/rmsaah/song-genre-classification.git
cd song-genre-classification

2. Install Dependencies

pip install -r requirements.txt

Streamlit Application

To run the Streamlit application, navigate to the directory containing the GenrePredictionApp.py and run the following command:

streamlit run GenrePredictionApp.py

Future Work

The current approaches use classical Machine Learning for classification. I plan on exploring Deep Learning algorithms to enhance the models performance in the future.

About

Classifying a song’s genre using its lyrics, by applying Natural Language Processing (NLP) and machine learning techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors