Skip to content

pradeep221b/energy-anomaly-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Energy Anomaly & Demand Intelligence

Energy utilities lose money when meter anomalies go unnoticed and when demand peaks are under-anticipated. This project monitors smart-meter behavior to flag operationally meaningful anomalies while forecasting 7-day and 30-day regional electricity demand with production-style validation, explainability, and uncertainty bands. It is built for the kind of analyst role that sits between data science, asset operations, and the grid control room.

At A Glance

  • Detects smart-meter anomalies with Isolation Forest + rolling z-score confirmation
  • Forecasts regional demand with XGBoost, TimeSeriesSplit, and walk-forward backtesting
  • Explains forecast drivers with native Tree SHAP
  • Adds forecast uncertainty with conformal-style interval bands
  • Simulates a 3-household portfolio on top of the real UCI baseline for portfolio-level anomaly analysis
  • Serves the results in a four-page Streamlit dashboard backed by DuckDB and Parquet

Visual Tour

Generated from the latest local run in this workspace.

Grid overview heatmap Demand forecast chart

Anomaly explorer timeline Portfolio comparison chart

Current Workspace Results

  • Household anomaly baseline: 2,075,259 cleaned minute-level readings
  • Confirmed anomalies on the real UCI household run: 366
  • PJM forecast validation MAPE on the latest trained run: 1.21%
  • Walk-forward backtest windows completed: 12
  • Forecast horizon generated: 720 future hourly predictions
  • Simulated portfolio households: 3

Reviewer Quickstart

If you are reviewing this repo quickly and do not want to run the full pipeline:

  1. Read the visual tour above
  2. Skim the architecture diagram below
  3. Run the tests
  4. Read the model card and limitations section
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
pytest tests/ -q

If you want to reproduce the full pipeline, follow the dataset and run instructions in the sections below.

Do You Need API Keys?

No API key is required for the core project flow.

  • UCI household data: public download, no login required
  • EIA demand data: works with the public bulk operating-data file fallback
  • Weather history: uses Meteostat first, Open-Meteo fallback when needed
  • .env.example: included only for optional direct EIA API access

Do not commit a real .env file or any live API keys.

Architecture

flowchart TD
    A["UCI Household Meter Data"] --> B["src.ingest"]
    C["EIA PJM Demand Data"] --> B
    D["Historical Weather"] --> B
    B --> E["src.preprocess"]
    E --> F["Household Hourly Features"]
    E --> G["Regional Demand Feature Frame"]
    F --> H["Isolation Forest + z-score validation"]
    G --> I["XGBoost Forecaster"]
    I --> J["Tree SHAP + Walk-forward Backtest + Interval Bands"]
    F --> K["simulate_households.py"]
    K --> L["Portfolio anomaly scoring"]
    H --> M["anomaly_labels.parquet"]
    J --> N["forecast_results.parquet"]
    M --> O["DuckDB + Streamlit"]
    N --> O
    L --> O
Loading

What Each Part Does

src/ingest

  • Validates the raw UCI household file
  • Pulls EIA demand and forecast data
  • Pulls and caches weather history
  • Stores raw payloads locally for reproducibility

src/preprocess

  • Cleans timestamps and missing values
  • Resamples household data to a continuous minute-level timeline
  • Builds hourly household rollups
  • Produces the clean PJM demand table used by the forecaster

src/anomaly_detector

  • Engineers contextual anomaly features
  • Trains Isolation Forest
  • Confirms anomalies only when the rolling_168h_zscore also exceeds threshold
  • Classifies each confirmed event into business-readable types such as demand_surge, voltage_irregularity, or data_gap

src/demand_forecaster

  • Builds leakage-safe lag, rolling, calendar, and weather features
  • Trains XGBoost with TimeSeriesSplit(n_splits=5, gap=24)
  • Produces walk-forward backtests, Tree SHAP importance, and interval bands
  • Saves forecast artifacts for the Streamlit app

src/simulate_households

  • Scales the real hourly household baseline into house_A, house_B, and house_C
  • Injects differentiated anomaly patterns
  • Enables a more realistic portfolio-style anomaly explorer

app/streamlit_app.py

  • Serves four views:
    • Grid Overview
    • Anomaly Explorer
    • Demand Forecast
    • Meter Deep Dive

Project Structure

energy-anomaly-intelligence/
|-- README.md
|-- LICENSE
|-- requirements.txt
|-- .gitignore
|-- .env.example
|-- assets/
|-- data/
|   |-- raw/
|   `-- processed/
|-- models/
|-- notebooks/
|-- scripts/
|-- sql/
|-- src/
|-- app/
|   `-- pages/
`-- tests/

Data Sources

UCI Household Meter Data

EIA Regional Demand

  • Public docs:
  • This project can use:
    • direct API access with EIA_API_KEY, or
    • the official bulk operating-data fallback

Weather

  • Meteostat first
  • Open-Meteo fallback when Meteostat is unavailable

What Is Committed Vs. Not Committed

This repo is intentionally set up like a real project, not a “dump everything into Git” repo.

Committed:

  • source code
  • tests
  • SQL
  • notebooks
  • README visuals
  • CI workflow

Not committed:

  • raw datasets
  • large processed artifacts
  • local DuckDB database
  • trained model binaries
  • .env secrets

This keeps the public repo lightweight and safe while still making the project understandable.

Full Reproduction

1. Create an environment

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

2. Add data

  • Place the UCI file at data/raw/household_power.txt
  • Optionally copy .env.example to .env if you want direct EIA API calls
  • If no EIA key is present, ingestion falls back to the EIA bulk operating-data path

3. Run the full pipeline

One-command helper:

.\scripts\run_pipeline.ps1 -GenerateReadmeAssets

Manual steps:

.venv\Scripts\python -m src.ingest --uci-path data/raw/household_power.txt --region PJM
.venv\Scripts\python -m src.preprocess
.venv\Scripts\python -m src.anomaly_detector --train
.venv\Scripts\python -m src.demand_forecaster --train --horizon-hours 720
.venv\Scripts\python -m src.simulate_households
.venv\Scripts\python -m src.anomaly_detector --train --input-path data/processed/meter_hourly_portfolio.parquet --output-path data/processed/anomaly_labels_portfolio.parquet --model-path models/isolation_forest_portfolio_v1.pkl --table-name anomaly_scores_portfolio

4. Launch the dashboard

.venv\Scripts\streamlit run app/streamlit_app.py

5. Regenerate README images only

.\scripts\generate_readme_assets.ps1

Dashboard Views

Grid Overview

  • KPI cards
  • 30-day consumption heatmap
  • load timeline with anomaly overlay

Anomaly Explorer

  • event timeline
  • sortable anomaly table
  • 24-hour context window
  • anomaly type distribution
  • business impact summary
  • simulated household selector when portfolio artifacts exist

Demand Forecast

  • actual vs forecast
  • 80% and 95% interval bands
  • hour-of-day error chart
  • Tree SHAP demand driver chart
  • walk-forward backtest trend

Meter Deep Dive

  • sub-meter stacked area chart
  • correlation heatmap
  • weekday vs weekend profile
  • narrative insights

Model Card

Property Anomaly Detector Demand Forecaster
Algorithm Isolation Forest XGBoost Regressor
Training data UCI household hourly rollups (2006-2010) PJM demand + weather (rolling 2-year window)
Validation method Rolling z-score confirmation TimeSeriesSplit (5 folds, 24h gap) + walk-forward backtest
Explainability Rule-based anomaly taxonomy Tree SHAP mean absolute contribution
Key metric Confirmed anomaly counts by severity MAPE: 1.21% on latest trained run
Interval logic Severity rules Conformal-style residual bands by hour-of-day
Retraining cadence Monthly recommended Monthly recommended
Known limitation Single-household real baseline Extreme shock performance depends on available exogenous signals

Validation

Latest local checks:

  • pytest tests/ -q
  • full CLI pipeline run through ingest, preprocess, anomaly scoring, forecast training, and portfolio scoring

Notes

  • Raw datasets and secrets are gitignored
  • The README visuals are generated from local pipeline outputs
  • The project is ready to publish once you connect a GitHub repository URL

About

Production-grade energy analytics system that detects meter anomalies using Isolation Forest, forecasts 30-day demand using XGBoost, and surfaces operational insights through an interactive Streamlit dashboard — built on real smart meter data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors