This project utilizes Deep Learning Neural Networks to predict the Water Quality Index (WQI) and Water Quality Classification using environmental monitoring data provided by the Central Pollution Control Board (CPCB), India.
- Project Overview
- Dataset Description
- Workflow
- Installation and Setup
- Predictive Models
- Evaluation Metrics
- Results
- License
- Contact
Access to clean water is a fundamental human necessity. However, water quality varies widely due to environmental, geographical, and human-induced factors. This project aims to accurately predict water quality metrics from chemical and physical parameters across various locations in India (2019-2022).
By leveraging Deep Learning, we provide two distinct predictive functionalities:
- Regression Analysis: Predicting the numerical Water Quality Index (WQI).
- Multi-class Classification: Categorizing samples into qualitative labels (e.g., Excellent, Good, Poor, Unsuitable).
The dataset contains chemical and physical samples collected from various wells across India.
- Geographical: Well_ID, State, District, Block, Village, Latitude, Longitude.
- Temporal: Year (2019, 2020, 2021, 2022).
- Indicators: pH, Electrical Conductivity (EC), Carbonates (CO3), Bicarbonates (HCO3), Chlorides (Cl), Sulfates (SO4), Nitrates (NO3), Total Hardness (TH), Calcium (Ca), Magnesium (Mg), Sodium (Na), Potassium (K), Fluoride (F), Total Dissolved Solids (TDS).
- WQI: Continuous numerical value.
- Water Quality Classification: Categorical (Excellent, Good, Poor, Very Poor yet Drinkable, Unsuitable for Drinking).
The following diagram illustrates the data processing and modeling pipeline:
graph TD
A[Data Acquisition: CPCB Water Quality Dataset] --> B[Data Preprocessing]
B --> B1[Locate Header & Clean Garbage Text]
B1 --> B2[Handle Missing Values: Median Filling]
B2 --> B3[Feature Type Conversion: Numeric Coercion]
B3 --> B4[Target Definition: WQI & Classification]
B4 --> C[Data Splitting: Train/Test]
C --> D[Feature Scaling: StandardScaler]
D --> E1[Deep Learning Regression Model]
D --> E2[Deep Learning Classification Model]
E1 --> F1[WQI Prediction]
E2 --> F2[Water Quality Category Labeling]
F1 --> G1[Evaluation: R2 Score, MAE]
F2 --> G2[Evaluation: Accuracy, F1-Score]
G1 --> H[Model Finalization]
G2 --> H
(The workflow source is also available in Flow/workflow.mmd)
To run this project locally, ensure you have Python 3.10+ installed.
-
Clone the repository:
git clone https://github.com/SANJAI-s0/WQI-WQP_using_DL_Neural_Network.git cd WQI-WQP_using_DL_Neural_Network -
Install dependencies:
pip install -r requirements.txt
-
Run the analysis: Open the Jupyter Notebook to view the full pipeline and metrics:
jupyter notebook Water_Quality_Prediction.ipynb
The project implements two separate Deep Neural Networks (DNN) using Keras/TensorFlow:
- Architecture: Sequential API with multiple Dense layers (64 -> 32 -> 16 -> 1).
- Optimizer: Adam.
- Loss Function: Mean Squared Error (MSE).
- Architecture: Sequential API (64 -> 32 -> 16 -> output_classes).
- Activation: ReLU for hidden layers, Softmax for the output layer.
- Loss Function: Sparse Categorical Crossentropy.
The models are evaluated based on the following:
-
Regression:
$R^2$ Score (Coefficient of Determination) and Mean Absolute Error (MAE). - Classification: Accuracy Score and Weighted F1-Score.
The models achieve reliable performance across the dataset. Detailed confusion matrices and loss curves can be found within the Water_Quality_Prediction.ipynb notebook.
This project is licensed under the MIT License - see the LICENSE file for details.
Sanjai - GitHub Profile
Project Link: https://github.com/SANJAI-s0/WQI-WQP_using_DL_Neural_Network