A text-mining and sentiment-analysis pipeline that examines emotional patterns in song lyrics and YouTube comments across music genres. The tool scrapes lyrics from Genius, collects YouTube comments via the Data API, runs sentiment analysis with multiple lexicons, and produces static and interactive visualisations of the results.
- Features
- Prerequisites
- Installation
- Configuration
- Usage
- Project Structure
- Output
- Troubleshooting
- License
- Multi-method lyrics retrieval — direct Genius web scraping with a
Python
lyricsgeniusfallback viareticulate. - YouTube comment collection — paginated fetching through the YouTube Data API v3 with automatic rate-limit pausing.
- Flexible sentiment analysis — supports the Bing, NRC, AFINN, and
Loughran lexicons from the
tidytextecosystem. - Genre-level comparison — aggregates sentiment and emotion scores by majority genre for cross-genre insight.
- Lyrics-versus-comments comparison — measures how listener sentiment in comments aligns with or diverges from lyric sentiment.
- Rich visualisations — word clouds, emotion heatmaps, radar charts, sentiment landscape maps, lexical-diversity plots, and an emotional-impact chart, each saved as both PNG and interactive HTML (Plotly).
| Dependency | Version | Purpose |
|---|---|---|
| R | ≥ 4.0.0 | Core runtime |
| RStudio | latest (recommended) | IDE with .Rproj support |
| Python | 3.x | lyricsgenius lyrics fallback |
pip package lyricsgenius |
latest | Python-based Genius scraper |
| Google Cloud project | — | YouTube Data API v3 key and OAuth |
| Genius API client | — | OAuth credentials for lyrics search |
git clone https://github.com/PenSul/Music-Sentiment-Analysis-Tool.git
cd Music-Sentiment-Analysis-ToolOpen R or RStudio and run:
install.packages(c(
"tidyverse", "tidytext", "tuber", "rvest", "httr", "jsonlite",
"textdata", "sentimentr", "wordcloud", "plotly", "knitr",
"rmarkdown", "dplyr", "stringr", "ggplot2", "scales",
"ggrepel", "htmlwidgets", "RColorBrewer", "reticulate"
))pip install lyricsgeniusVerify the installation:
pip list | grep lyricsgeniusAll credentials and tuneable parameters live in R/config.R.
Open R/config.R and fill in the credential placeholders:
kApiCredentials <- list(
genius = list(
client_id = "<YOUR_GENIUS_CLIENT_ID>",
client_secret = "<YOUR_GENIUS_CLIENT_SECRET>",
redirect_uri = "http://localhost:1410/"
),
youtube = list(
app_name = "<YOUR_APP_NAME>",
client_id = "<YOUR_GOOGLE_CLIENT_ID>",
client_secret = "<YOUR_GOOGLE_CLIENT_SECRET>",
api_key = "<YOUR_YOUTUBE_API_KEY>"
)
)Important: Never commit real credentials. Add
R/config.Rto.gitignoreor use environment variables in production.
Adjust the settings block in the same file to control comment limits, the sentiment lexicon, and word-cloud thresholds.
- Open the project in RStudio (double-click
Music Sentiment Analysis.Rproj). - Make sure the working directory is the project root.
- Verify that
song_list.csvis present at the project root. - Run the entry-point script:
source("main.R")- When prompted, authenticate with Google/YouTube and Genius through the browser windows that open automatically.
- Wait for the pipeline to finish. Progress messages are printed to the console at every major step.
.
├── main.R # Entry point — orchestrates the full pipeline
├── song_list.csv # Input song catalogue (CSV)
├── R/
│ ├── config.R # Global constants, API credentials, colours
│ ├── setup.R # Directory creation, package checks, stopwords
│ ├── data_collection.R # CSV reader, API auth, lyrics + comments collection
│ ├── text_processing.R # Tokenisation, stopword removal, TF-IDF, n-grams
│ ├── sentiment_analysis.R # Lexicon-based sentiment scoring and comparison
│ ├── visualization.R # All static and interactive plot generators
│ ├── genius_auth.R # Genius OAuth 2.0 helper
│ ├── lyrics_scraper.R # Web-scraping and Python-fallback lyrics retrieval
│ └── youtube_scraper.R # Direct YouTube Data API comment fetcher
├── Data/ # Created at runtime
│ ├── Lyrics/ # Individual lyrics text files
│ ├── Comments/ # Per-song comment CSVs
│ └── Output/ # Processed CSVs and analysis results
├── Visualizations/ # Created at runtime — PNGs and interactive HTMLs
├── LICENSE
├── README.md
└── .gitignore
After a successful run, the pipeline produces:
| Directory | Contents |
|---|---|
Data/Lyrics/ |
One .txt file per song containing scraped lyrics. |
Data/Comments/ |
One _comments.csv file per song with YouTube comment data. |
Data/Output/ |
Merged CSVs for lyrics, comments, sentiment scores, TF-IDF, bigrams, genre comparisons, and emotional-impact insights. |
Visualizations/ |
PNG images and self-contained interactive HTML files for every chart type. |
Open any _interactive.html file in a browser to explore the Plotly
visualisations.
-
Confirm credentials in
R/config.Rmatch those in the Google Cloud Console and Genius API dashboard. -
If the OAuth browser window does not appear, try authenticating manually:
library(tuber) yt_oauth("<client_id>", "<client_secret>")
-
Ensure
pythonis on your systemPATH. -
If
reticulatecannot find Python, set the path explicitly:reticulate::use_python("/path/to/python")
Run the built-in check:
source("R/setup.R")
CheckAndInstallPackages()Lyrics retrieval from Genius may be unreliable for CJK or other non-Latin titles. The pipeline logs each failure and continues to the next song. A manual-lyrics fallback is available for testing purposes.
This project is released under the MIT License.