This project demonstrates how I cleaned the messy Atlantic Hurricanes (Data Cleaning Challenge) dataset (1920–2020) and transformed it into a structured, query-ready SQL view.
data/raw/Hurricanes.csv— original Kaggle CSV (CC0).sql/Hurricanes final clean.sql— end-to-end T-SQL script that cleans the raw data and builds thedbo.Hurricanes_Finalview.data/clean/Hurricanes_Final.csv— optional export of the final cleaned dataset.
-
Create a database
HurricanesDBand importdata/raw/Hurricanes.csvintodbo.Hurricanes. -
Run:
sql/Hurricanes final clean.sql
- Normalize messy text, whitespace, and quotes.
- Parse and standardize dates into
StartDateandEndDate. - Convert wind speeds into both mph and km/h.
- Normalize pressure into both hPa and inHg.
- Convert damage estimates (million/billion/k) into numeric USD.
- Derive DurationDays and Saffir-Simpson Category.
- Dates are exported as
YYYY-MM-DD. DurationDays= difference betweenStartDateandEndDate.- Wind speeds and pressures are standardized for consistent analysis.
- Damage text like “Unknown”, “Minimal”, etc. are set to
NULL. - If opening the CSV in Excel, use Delimited → Comma import (not Fixed Width).
Valery Liamtsau, Atlantic Hurricanes (Data Cleaning Challenge), Kaggle, CC0 1.0.
https://www.kaggle.com/datasets/valerylia/atlantic-hurricanes-data-cleaning-challenge

