Skip to content

Awande07/time-series-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Time Series Analysis: Airline Passengers Forecasting

Python Pandas Statsmodels Scikit-learn License

��� Project Overview

A comprehensive time series analysis and forecasting project using classical time series methods on airline passenger data (1949-1960).

Dataset: Airline Passengers (International Airline Passengers, 1949-1960)
Time Period: 12 years (144 monthly observations)
Forecast Horizon: 24 months (1961-1962)
Best Model: SARIMA(0,1,1)(0,1,1,12) - "Airline Model"
Model Accuracy: 0.48% MAPE on test set

��� Project Structure

time-series-analysis/ ├── data/ # Data files │ ├── airline_passengers.csv # Original dataset │ ├── full_processed_data.csv # Fully processed data │ ├── train_data.csv # Training data (80%) │ ├── test_data.csv # Testing data (20%) │ ├── decomposition_results.csv # Trend/seasonal/residual │ ├── seasonal_pattern.csv # Monthly seasonal effects │ ├── model_comparison.csv # Model performance comparison │ ├── best_model_forecast.csv # Test set forecasts │ ├── best_model_summary.txt # Model statistics │ ├── future_forecasts.csv # 24-month predictions │ └── forecast_summary.csv # Forecast statistics ├── notebooks/ # Analysis scripts │ ├── 01_exploration.py # Data exploration │ ├── 02_preprocessing.py # Data preprocessing │ ├── 03_decomposition.py # Time series decomposition │ ├── 04_model_selection.py # ARIMA/SARIMA model selection │ ├── 05_forecasting.py # Future predictions │ └── 06_executive_summary.py # Executive summary ├── reports/ # Analysis reports │ ├── images/ # Visualizations (12 images) │ ├── data_exploration_report.txt # Initial data analysis │ ├── preprocessing_report.txt # Data preparation summary │ ├── decomposition_report.txt # Time series decomposition │ ├── model_selection_report.txt # Model evaluation │ ├── forecasting_report.txt # Future predictions │ └── executive_summary_report.txt # Complete executive summary ├── requirements.txt # Python dependencies ├── LICENSE # MIT License └── README.md # This file

��� Quick Start

# Clone repository
git clone https://github.com/Awande07/time-series-analysis.git
cd time-series-analysis

# Install dependencies
pip install -r requirements.txt

# Run analysis (in order)
python notebooks/01_exploration.py
python notebooks/02_preprocessing.py
python notebooks/03_decomposition.py
python notebooks/04_model_selection.py
python notebooks/05_forecasting.py
python notebooks/06_executive_summary.py
Key Results
Model Performance

    Best Model: SARIMA(0,1,1)(0,1,1,12)

    Test Accuracy: 0.48% MAPE (Mean Absolute Percentage Error)

    AIC Score: -323.7

    Residuals: Stationary with mean close to zero

Forecast Results (1961-1962)

    Average Forecast: 547,000 passengers/month

    Peak Forecast: 739,000 passengers (July 1962)

    Trough Forecast: 426,000 passengers (February 1961)

    Total Growth: 16.6% over 2 years

    Monthly Growth Rate: 0.6%

    Uncertainty: ±16.8% (95% confidence interval)

Seasonal Insights

    Peak Month: July (+63.83 passenger effect)

    Trough Month: November (-53.59 passenger effect)

    Seasonal Amplitude: 117 passengers

    Pattern: Strong 12-month cycle with summer peaks

��� Methodology

    Data Exploration & Preprocessing

        Loaded and explored 144 monthly observations

        Applied transformations (log, differencing)

        Tested for stationarity (ADF test)

        Created train/test split (80%/20%)

    Time Series Decomposition

        Applied additive and multiplicative decomposition

        Identified trend, seasonal, and residual components

        Multiplicative model preferred (335,541x better variance ratio)

    Model Selection

        Tested multiple ARIMA/SARIMA models

        Selected based on AIC/BIC criteria

        Validated with residual diagnostics

        Achieved 0.48% MAPE on test set

    Forecasting

        Generated 24-month forecasts

        Calculated confidence intervals

        Analyzed uncertainty and growth patterns

        Created business-ready insights

��� Visualizations

The project includes 12 comprehensive visualizations:

    01_data_exploration.png - Initial data exploration

    02_seasonality_analysis.png - Seasonal patterns

    03_transformations.png - Time series transformations

    04_acf_pacf.png - Autocorrelation analysis

    05_train_test_split.png - Train/test split

    06_decomposition.png - Time series decomposition

    07_multiplicative_decomposition.png - Multiplicative model

    08_model_performance.png - Model performance

    09_residuals_analysis.png - Residuals analysis

    10_forecast_analysis.png - Forecast analysis

    11_forecast_growth.png - Forecast growth

    12_executive_dashboard.png - Executive dashboard

���️ Technical Stack

    Python 3.9

    pandas - Data manipulation

    numpy - Numerical computations

    matplotlib & seaborn - Visualization

    statsmodels - Time series analysis

    scikit-learn - Model evaluation metrics

��� Files Generated
Data Files (8 files)

    Original and processed datasets

    Decomposition results

    Model comparisons

    Future forecasts

Reports (6 files)

    Detailed analysis reports

    Executive summary

    Business recommendations

Visualizations (12 images)

    Comprehensive charts and dashboards

    Model diagnostics

    Forecast visualizations

��� Business Applications

    Capacity Planning: Plan for 547K avg passengers with 739K summer peaks

    Resource Allocation: Increase resources by 16.6% over 2 years

    Risk Management: Account for ±16.8% forecast uncertainty

    Strategic Planning: Support investment decisions with data-driven forecasts

��� Contributing

Contributions are welcome! Please follow these steps:

    Fork the repository

    Create a feature branch (git checkout -b feature/improvement)

    Commit changes (git commit -am 'Add improvement')

    Push to branch (git push origin feature/improvement)

    Create Pull Request

��� License

This project is licensed under the MIT License - see the LICENSE file for details.
��� Author

Awande Gcabashe

    GitHub: Awande07

    Project: Time Series Analysis & Forecasting

��� Acknowledgments

    Dataset: International Airline Passengers (Box & Jenkins, 1976)

    Methodology: Classical time series analysis techniques

    Tools: Python, pandas, statsmodels, matplotlib

    Inspiration: Real-world business forecasting challenges

About

Time series analysis and forecasting of airline passengers using SARIMA models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages