This repository contains a multi-page Streamlit web application that demonstrates a complete end-to-end machine learning workflow. The application handles everything from rate-limited API data collection and CSV storage to interactive data exploration, feature selection, model training, and live prediction.
The primary goal of this project is to build a classification model capable of predicting weather conditions (e.g., 'Rain', 'Clouds', 'Clear') based on various meteorological features collected from the OpenWeatherMap API.
- Rate-Limited Data Collection: Safely collects weather data for thousands of cities using a robust, rate-limited, and asynchronous approach to respect API limits.
- Interactive Data Exploration: A dedicated page to view the raw dataset, analyze column statistics, and visualize feature distributions with interactive Plotly charts.
- Dynamic Feature & Target Selection: Interactively select which columns to use as input features and which to set as the prediction target for the model.
- ML Model Training: Train classification models (like Random Forest and Logistic Regression) with a single click.
- Clear Model Evaluation: Instantly view model performance with clear, table-based metrics, including accuracy, precision, recall, a confusion matrix, and a full classification report.
- Manual & Live Prediction: Make predictions by manually entering feature values or by using real-time weather data fetched directly from the API for any city.
- Application Framework: Streamlit
- Data Manipulation: Pandas, NumPy
- Machine Learning: Scikit-learn
- Data Visualization: Plotly
- API Requests: Requests
.
├── pages/
│ ├── 02_Data_Exploration.py
| ├── 02_Data_Collection.py
│ ├── 03_Feature_Selection.py
│ ├── 04_Model_Training.py
│ ├── 05__Manual_Prediction.py
│ └── 06_Live_API_Prediction.py
├── Home.py # Main entry point of the app
├── README.md # You are here <──
├── current_city_list.json # Default city list for randomize
├── data_utils.py # Helper functions for data loading & API calls
├── derived_cities_for_collection.csv # Randomized city list
├── model_utils.py # Helper functions for preprocessing & model training
├── pexels-jplenio-1118873.jpg # Photo assets
├── rate_limited_weather_data.csv # Weather dataset
└── requirements.txt # Python package dependencies
(weather-classification-ml.streamlit.app)
Follow these steps to get the application running on your local machine.
- Python 3.8 - 3.11
- An API Key from OpenWeatherMap (the free tier is sufficient).
git clone [https://github.com/your-username/your-repo-name.git](https://github.com/your-username/your-repo-name.git)
cd your-repo-nameIt is highly recommended to use a virtual environment to manage dependencies.
# For Windows
python -m venv venv
venv\Scripts\activate
# For macOS/Linux
python3 -m venv venv
source venv/bin/activateRun this command to install all required libraries from the requirements.txt file.
pip install -r requirements.txtWith your virtual environment activated, run the following command from the project's root directory:
streamlit run Home.pyYour web browser should automatically open a new tab with the running application.
-
Data Collection:
- Navigate to the
Rate Limited Data Collectionpage (the home page). - You can load an existing weather data CSV file or start a new collection process.
- For a new collection, enter your API key in the configuration section, set the number of cities to sample from the source file (
derived_cities_for_collection.csv), and click "Prepare Randomized City List". - Once the city list is ready, click "Collect Weather Data" to begin.
- Navigate to the
-
Dataset Exploration:
- Go to the
Dataset Explorationpage from the sidebar. - View the complete dataset, see detailed info on each column, and use the interactive charts to analyze feature distributions.
- Go to the
-
Feature Selection:
- On the
Feature Selectionpage, choose your prediction target (e.g.,General Weather Category). - Select the features you believe will be useful for the prediction. Irrelevant features are automatically filtered out.
- On the
-
Model Training:
- On the
Model Trainingpage, select a model from the dropdown menu. - Click the "Train Model" button to start the training and evaluation process.
- The results, including performance metrics and a confusion matrix, will be displayed and will persist even if you navigate to other pages.
- On the
-
Prediction:
- Use the
Manual Predictionpage to manually input feature values and get a prediction. - Use the
Live API Predictionpage to enter a city name and get a prediction based on real-time weather data.
- Use the