a personalized movie database for my friend Juan

-
The original source data was a .txt file containing a list of movies/tv shows, the order they were watched that year, and a rating out of 10
-
The .txt file was parsed in
create_silver_jpmdb.py, including parsing the ratings, seasons, watch order, year specifiers and other metadata -
Downloaded imdb data from IMDb Datasets and converted the .gz files into
silver/imdb/title_basicsandsilver/imdb/title_ratingsusingcreate_silver_imdb.py -
The jpmdb and imdb datasets were initially joined using standard string cleaning and fuzzy matching approaches into
stg_jpmdb_combinedusingcreate_silver_stg_jpmdb_combined.py -
Entries were manually reviewed a small CLI tool
review_combined_jpmdb.py, giving an opportunity to correct fuzzy matching errors and manually add missing entries -
After all entries were validated, the data was moved to the gold table
gold/jpmdbincreate_gold_jpmdb.py
The dashboard is built using Dash and Plotly. It currently includes 4 visualizations:
- A virtualized table of all entries in the database
- A scatter plot of ratings over the watched order to show ratings over time
- A scatter plot comparing ratings to IMDb ratings
- A box plot showing distribution of ratings per IMDb genre
- [] incorporate scraped poster images into the dashboard
- [] cross visualization filtering by genre
- [] short summary of top 10 titles per year