jpmdb

a personalized movie database for my friend Juan

Cleaning Process

The original source data was a .txt file containing a list of movies/tv shows, the order they were watched that year, and a rating out of 10
The .txt file was parsed in create_silver_jpmdb.py, including parsing the ratings, seasons, watch order, year specifiers and other metadata
Downloaded imdb data from IMDb Datasets and converted the .gz files into silver/imdb/title_basics and silver/imdb/title_ratings using create_silver_imdb.py
The jpmdb and imdb datasets were initially joined using standard string cleaning and fuzzy matching approaches into stg_jpmdb_combined using create_silver_stg_jpmdb_combined.py
Entries were manually reviewed a small CLI tool review_combined_jpmdb.py, giving an opportunity to correct fuzzy matching errors and manually add missing entries
After all entries were validated, the data was moved to the gold table gold/jpmdb in create_gold_jpmdb.py

Dashboard

The dashboard is built using Dash and Plotly. It currently includes 4 visualizations:

A virtualized table of all entries in the database
A scatter plot of ratings over the watched order to show ratings over time
A scatter plot comparing ratings to IMDb ratings
A box plot showing distribution of ratings per IMDb genre

TODOs

[] incorporate scraped poster images into the dashboard
[] cross visualization filtering by genre
[] short summary of top 10 titles per year

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
docs		docs
jpmdb		jpmdb
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile-frontend		Dockerfile-frontend
README.md		README.md
docker-compose.yml		docker-compose.yml
gunicorn_config.py		gunicorn_config.py
hello.py		hello.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jpmdb

Cleaning Process

Dashboard

TODOs

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

jpmdb

Cleaning Process

Dashboard

TODOs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages