GitHub - hardiktrehan1/BAC-Stock-Predictive-Analysis-using-Linear-Regression: This repository contains a predictive modeling project focused on forecasting Bank of America's (BAC) stock prices using linear regression. It includes data preprocessing, feature engineering (including lag variables and moving averages), model training, evaluation using RMSE and MSE, and visualizations to assess model performance.

📈 BAC Stock Predictive Analysis using Linear Regression

This project presents a comprehensive analysis and predictive modeling approach to forecasting the stock price of Bank of America (BAC) using Linear Regression. The repository covers the full data science workflow — from data acquisition to feature engineering, model building, evaluation, and diagnostic testing of regression assumptions.

Project Overview

The primary objective of this project is to build an interpretable regression model to predict BAC's stock price based on historical market data. This is done using lag features, moving averages, and relevant index indicators. The model’s performance is evaluated using error metrics like Root Mean Squared Error (RMSE) and Mean Squared Error (MSE) to assess its predictive accuracy.

Key Features

Data pulled using yfinance for BAC and major market indices.
Feature engineering includes:
Lag variables (e.g., BAC(t-1), QQQ(t-1), ^GSPC(t-1))
Technical indicators like 5-day Moving Average
Feature selection techniques were applied to reduce multicollinearity, improving model stability and interpretability.
Linear Regression modeling with statsmodels.api

Model evaluation using:

RMSE: 0.7358
MSE: 0.5414

These values indicate a moderate prediction error, with the model deviating by approximately 0.73 units from actual stock prices on average.

Regression Assumptions Checked

To ensure model robustness and statistical validity, all five key assumptions of linear regression were tested:

Linearity

Verified through residual vs. fitted plots to confirm linear relationship between predictors and response.

Homoskedasticity

Checked using residual plots to ensure constant variance across predictions.

Multicollinearity

Tested using Variance Inflation Factor (VIF) from statsmodels.stats.outliers_influence to identify and remove highly correlated predictors.

Normality of Residuals

Assessed using Q-Q plots and histograms of residuals to confirm approximate normal distribution.

Autocorrelation of Residuals

Evaluated with the Durbin-Watson statistic to detect serial correlation in residuals.

Tools & Libraries Used

numpy
pandas
yfinance – for pulling stock and index data
matplotlib.pyplot & seaborn – for visualizations
statsmodels.api – for building and interpreting the regression model
statsmodels.stats.outliers_influence – for VIF and multicollinearity analysis

Results & Next Steps

The model currently exhibits an RMSE of 0.7358, indicating a moderate average prediction error. While linear regression offers simplicity and interpretability, future enhancements could include:
Feature selection or dimensionality reduction
Applying regularized models like Ridge or Lasso
Exploring nonlinear models such as Random Forest or XGBoost

Disclaimer

This project is for educational and illustrative purposes only and does not constitute financial advice.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Code		Code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 BAC Stock Predictive Analysis using Linear Regression

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📈 BAC Stock Predictive Analysis using Linear Regression

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages