This repository implements a linear regression model from scratch using NumPy to predict house prices. The model is trained using batch gradient descent and visualizes the convergence process over iterations.
The dataset is loaded from an Excel file (Cleaned_Data.xlsx) and contains multiple features relevant to housing prices. It assumes:
Idcolumn is dropped as it's not useful for prediction.- The last column is the target: house price.
- All other columns are considered features.
- Data is read using
pandas. - Manual standardization (z-score normalization) is applied to both features and target.
- An intercept term (bias) is added as a column of ones.
- Data is randomly split into 80% training and 20% testing.
- The model uses gradient descent to minimize the Mean Squared Error (MSE) loss.
- Learning rate:
0.01 - Epochs:
1000 - Weights are initialized to zeros.
- After training, the model makes predictions on the test set.
- Predicted and true values are rescaled back to original units.
- Final MSE is printed as the performance metric.
- A line plot displays the MSE loss over training epochs.
Epoch 0 - Loss: 1.1035
Epoch 100 - Loss: 0.1712
...
Epoch 900 - Loss: 0.0649
MSE final: 13582.43
├── Cleaned_Data.xlsx # Input dataset
├── linear_regression.py # Main Python script
└── README.md # Documentation
Make sure the required packages are installed:
pip install pandas numpy matplotlib openpyxlThen run the script:
python linear_regression.pyEnsure the path to Cleaned_Data.xlsx is valid or adjust it in the script.
- This implementation does not use machine learning libraries like Scikit-learn or TensorFlow—everything is implemented from scratch.
- Suitable as an educational example for understanding linear regression and gradient descent.