- Create a .venv in the src directory
- Run this code to install dependencies
py -m pip install -r requirements.txt
- Move cleaned datasets from KNNFeatureCreation colab into the data folder
- Run main.py file
- Run this to move into the src directory
cd src
- Run this code to install dependencies
uv sync
- Run this code to run the main.py file
uv run main.py
Models stored in the models dictionary.
rf_post: A random forest model trained on the top 20 most important features. It is used to predict Post Covid RevPAR
xgb_post: A XGBoost model trained on the entire Post-covid dataset. It is used to predict Post Covid RevPAR
rf_pre: A random forest model trained on the top 20 most important features. It is used to predict Pre Covid RevPAR
xgb_pre: A XGBoost model trained on the entire Post-covid dataset. It is used to predict Post Covid RevPAR
Remember that the rf models are only trained on the top 20 features. Remember to trim the features dataframe so that the model works
The below notebooks should be run in the following order in Google Colab after running clean_and_preprocess() from cleaning_pre_processing_and_trees/main.py. Before running the below, store in Colab the cleaned train and test csv data in the form train.csv,test.csv; and the original drive time data master_panel_drv10.csv,master_panel_drv15.csv,master_panel_drv30.csv; and the original scoring data scoring.csv.
Data Exploration.ipynbKNNFeatureCreation.ipynbNeuralNet.ipynbEnsembling.ipynbVisualizations2.ipynbVisualizations.ipynb
filled_scoring.csv contains the ensemble model's prediction for scoring.csv.