- Task 1 - Install Spark on Google Colab and load datasets in PySpark
- Task 2 - Change column datatype, remove whitespaces and drop duplicates
- Task 3 - Remove columns with Null values higher than a threshold
- Task 4 - Group, aggregate and create pivot tables
- Task 5 - Rename categories and impute missing numeric values
- Task 6 - Create visualizations to gather insights
AndreBluhm/Project_Data-Analysis-PySpark
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
