All data is from this Kaggle competition. Download and extract the files into the data/ directory.
Run this inside the setup.ipynb notebook.
Start with the eda.ipynb notebook. Along with some EDA charts, the main point here is to combine Kaggle's train and test data into a single dataset that will be split later into train/test.
Next is the prep.ipynb notebook. This cleans and encodes the text for input into the model.
The use train.ipynb to train the model.