🩺 Diabetes Risk Predictor (SDG 3 – Good Health & Well-being)

This project is a simple offline health risk prediction tool that estimates the risk of type 2 diabetes using basic health parameters. It uses a Logistic Regression model trained on the Pima Indians Diabetes Dataset and a Streamlit web app for interaction.

The app supports Sustainable Development Goal 3 (SDG 3): Good Health & Well-being by:

Raising awareness about diabetes risk
Encouraging early screening and preventive behaviour
Demonstrating how low-cost, offline AI tools can support health decision-making

⚠️ Disclaimer: This project is for education and demonstration only.
It is NOT a medical device and must not be used for real diagnosis or treatment.

PitchDeck

🖼️ Screenshots

1. Risk Prediction Tab

2. About & SDG 3 Tab

3. Usage Stats Tab

🗂️ Project Structure

diabetes_project/
  ├── screenshots/
  │   ├── prediction_tab.png
  │   ├── about_tab.png
  │   └── usage_stats_tab.png
  ├── app.py
  ├── train_model.py
  ├── diabetes_model.pkl
  ├── diabetes.csv
  └── README.md

---

## 💻 1. Requirements

You need:

* **Python 3.8+**
* Recommended: a virtual environment (optional but clean)

Install the required Python packages:

```bash
pip install streamlit scikit-learn pandas numpy

📥 2. Dataset

This project uses the Pima Indians Diabetes Dataset. Make sure you have a file named diabetes.csv in the project folder with columns like:

Pregnancies
Glucose
BloodPressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
Outcome (0 = no diabetes, 1 = diabetes)

If you downloaded the dataset from Kaggle or GitHub, just rename the CSV to diabetes.csv and place it in the same folder as train_model.py.

🧠 3. Training the Model

Run the training script to:

Load and clean the dataset
Split it into train/test sets
Train a Logistic Regression model
Save the model and scaler into diabetes_model.pkl

From the project folder, run:

python train_model.py

You should see output including:

Accuracy on the test set
Classification report
Confusion matrix
A message like: Model saved to diabetes_model.pkl

After this, you should have:

diabetes_model.pkl

in your folder.

🌐 4. Running the Streamlit App

Once the model file (diabetes_model.pkl) exists, start the app with:

streamlit run app.py

This will open the app in your browser at:

http://localhost:8501

The app has three main tabs:

🔮 Risk Prediction

Enter your health parameters in the sidebar:
- Pregnancies
- Glucose (mg/dL)
- Blood Pressure (mm Hg)
- Skin Thickness (mm)
- Insulin (mu U/ml)
- BMI (kg/m²)
- Diabetes Pedigree Function
- Age (years)
Click “Predict Risk”.
The app will display:
- A prediction (low diabetes risk vs diabetes risk)
- Estimated probability of diabetes (0–1)
- A risk level: Low / Medium / High
- General, non-medical health tips and a disclaimer

All predictions are made locally and offline using the saved model.

ℹ️ About & SDG 3

This tab explains:

The goal of the project
Connection to SDG 3: Good Health & Well-being
Basic information about the dataset and model
Important disclaimer: the app is not a diagnostic tool

This is useful for reports, presentations, or anyone viewing the project for the first time.

📈 Usage Stats

Each time the user makes a prediction, the app logs it to usage_log.csv (created automatically). The log does not contain any names or IDs, just:

Timestamp
Input health parameters
Predicted class
Probability
Risk level (Low/Medium/High)

The Usage Stats tab shows:

Total number of predictions
Average predicted probability of diabetes
A bar chart of risk levels (Low/Medium/High)
A table showing the last 10 predictions

If usage_log.csv does not exist yet, the tab will show a message asking you to make some predictions first.

🧪 5. How It Works (Technical Overview)

Data Preprocessing
- Load diabetes.csv using pandas.
- Replace unrealistic zeros in:
  - Glucose, BloodPressure, SkinThickness, Insulin, BMI
- Impute these with the median of each column.
- Separate features (X) and target (y = Outcome).
Train–Test Split
- Split data into 80% training and 20% testing.
- Use stratified sampling to maintain class balance.
Scaling
- Use StandardScaler to scale the features.
- Fit on training data, transform both train and test, and later user input.
Model
- Use Logistic Regression (sklearn.linear_model.LogisticRegression).
- Train on the scaled training data.
- Evaluate on the test set (accuracy, precision, recall, etc.).
Saving Model
- Save a Python dictionary containing:
  - model – the trained Logistic Regression model
  - scaler – the fitted StandardScaler
  - feature_names – list of feature names
- Use pickle.dump() to store it in diabetes_model.pkl.
Streamlit App
- Load diabetes_model.pkl at startup.
- Collect user input via Streamlit widgets in the sidebar.
- Convert input to a NumPy array, scale with the loaded scaler.
- Call model.predict() and model.predict_proba() to generate predictions.
- Display results + risk levels + health tips.
- Log each prediction to usage_log.csv for stats.

📊 6. Example Output / Metrics

When you run train_model.py, you’ll get evaluation metrics on the test set, such as:

Test accuracy (e.g., ~0.75–0.80)
Precision and recall for both classes
Confusion matrix

You can copy these results into your report or presentation.

🌍 7. SDG 3 – Good Health & Well-being

This project demonstrates how AI and simple web tools can support:

Awareness of non-communicable diseases (like diabetes)
Early screening mindset, by prompting users to think about risk
Low-cost, offline tools that can be run anywhere with a laptop

Even though this tool is not clinically validated, it illustrates important ideas behind using data and machine learning to support public health, linking directly to SDG 3.

⚠️ 8. Limitations

Dataset is limited to a specific population (Pima Indian women).
Only 8 numerical features are used. Real diagnosis requires more data and clinical judgement.
Logistic Regression is simple and may not be the most accurate possible model.
The model is not clinically validated and must not be used for real patient care.

🔮 9. Possible Future Improvements (Optional)

These are ideas only – not implemented in this version:

Try and compare other models (e.g., Random Forest, Gradient Boosting).
Add more visual explanations for how each feature affects the prediction.
Extend the app to predict risk for other diseases (e.g., heart disease).
Improve content and explanations for different audiences.
Add support for multiple languages.

✅ 10. How to Run (Quick Steps)

Clone or copy this project folder.
Place diabetes.csv in the project folder.

Install dependencies:

pip install streamlit scikit-learn pandas numpy

Train the model:
```
python train_model.py
```
Run the app:
```
streamlit run app.py
```
Open your browser at http://localhost:8501 and start exploring.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
Upgrade.md		Upgrade.md
app.py		app.py
diabetes.csv		diabetes.csv
diabetes_model.pkl		diabetes_model.pkl
feature_importances.csv		feature_importances.csv
model_metrics.json		model_metrics.json
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺 Diabetes Risk Predictor (SDG 3 – Good Health & Well-being)

PitchDeck

🖼️ Screenshots

1. Risk Prediction Tab

2. About & SDG 3 Tab

3. Usage Stats Tab

🗂️ Project Structure

📥 2. Dataset

🧠 3. Training the Model

🌐 4. Running the Streamlit App

🔮 Risk Prediction

ℹ️ About & SDG 3

📈 Usage Stats

🧪 5. How It Works (Technical Overview)

📊 6. Example Output / Metrics

🌍 7. SDG 3 – Good Health & Well-being

⚠️ 8. Limitations

🔮 9. Possible Future Improvements (Optional)

✅ 10. How to Run (Quick Steps)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🩺 Diabetes Risk Predictor (SDG 3 – Good Health & Well-being)

PitchDeck

🖼️ Screenshots

1. Risk Prediction Tab

2. About & SDG 3 Tab

3. Usage Stats Tab

🗂️ Project Structure

📥 2. Dataset

🧠 3. Training the Model

🌐 4. Running the Streamlit App

🔮 Risk Prediction

ℹ️ About & SDG 3

📈 Usage Stats

🧪 5. How It Works (Technical Overview)

📊 6. Example Output / Metrics

🌍 7. SDG 3 – Good Health & Well-being

⚠️ 8. Limitations

🔮 9. Possible Future Improvements (Optional)

✅ 10. How to Run (Quick Steps)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages