Energy utilities lose money when meter anomalies go unnoticed and when demand peaks are under-anticipated. This project monitors smart-meter behavior to flag operationally meaningful anomalies while forecasting 7-day and 30-day regional electricity demand with production-style validation, explainability, and uncertainty bands. It is built for the kind of analyst role that sits between data science, asset operations, and the grid control room.
- Detects smart-meter anomalies with
Isolation Forest + rolling z-score confirmation - Forecasts regional demand with
XGBoost,TimeSeriesSplit, and walk-forward backtesting - Explains forecast drivers with native
Tree SHAP - Adds forecast uncertainty with conformal-style interval bands
- Simulates a 3-household portfolio on top of the real UCI baseline for portfolio-level anomaly analysis
- Serves the results in a four-page Streamlit dashboard backed by DuckDB and Parquet
Generated from the latest local run in this workspace.
- Household anomaly baseline:
2,075,259cleaned minute-level readings - Confirmed anomalies on the real UCI household run:
366 - PJM forecast validation MAPE on the latest trained run:
1.21% - Walk-forward backtest windows completed:
12 - Forecast horizon generated:
720future hourly predictions - Simulated portfolio households:
3
If you are reviewing this repo quickly and do not want to run the full pipeline:
- Read the visual tour above
- Skim the architecture diagram below
- Run the tests
- Read the model card and limitations section
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
pytest tests/ -qIf you want to reproduce the full pipeline, follow the dataset and run instructions in the sections below.
No API key is required for the core project flow.
UCI household data: public download, no login requiredEIA demand data: works with the public bulk operating-data file fallbackWeather history: uses Meteostat first, Open-Meteo fallback when needed.env.example: included only for optional direct EIA API access
Do not commit a real .env file or any live API keys.
flowchart TD
A["UCI Household Meter Data"] --> B["src.ingest"]
C["EIA PJM Demand Data"] --> B
D["Historical Weather"] --> B
B --> E["src.preprocess"]
E --> F["Household Hourly Features"]
E --> G["Regional Demand Feature Frame"]
F --> H["Isolation Forest + z-score validation"]
G --> I["XGBoost Forecaster"]
I --> J["Tree SHAP + Walk-forward Backtest + Interval Bands"]
F --> K["simulate_households.py"]
K --> L["Portfolio anomaly scoring"]
H --> M["anomaly_labels.parquet"]
J --> N["forecast_results.parquet"]
M --> O["DuckDB + Streamlit"]
N --> O
L --> O
- Validates the raw UCI household file
- Pulls EIA demand and forecast data
- Pulls and caches weather history
- Stores raw payloads locally for reproducibility
- Cleans timestamps and missing values
- Resamples household data to a continuous minute-level timeline
- Builds hourly household rollups
- Produces the clean PJM demand table used by the forecaster
- Engineers contextual anomaly features
- Trains
Isolation Forest - Confirms anomalies only when the
rolling_168h_zscorealso exceeds threshold - Classifies each confirmed event into business-readable types such as
demand_surge,voltage_irregularity, ordata_gap
- Builds leakage-safe lag, rolling, calendar, and weather features
- Trains
XGBoostwithTimeSeriesSplit(n_splits=5, gap=24) - Produces walk-forward backtests, Tree SHAP importance, and interval bands
- Saves forecast artifacts for the Streamlit app
- Scales the real hourly household baseline into
house_A,house_B, andhouse_C - Injects differentiated anomaly patterns
- Enables a more realistic portfolio-style anomaly explorer
- Serves four views:
Grid OverviewAnomaly ExplorerDemand ForecastMeter Deep Dive
energy-anomaly-intelligence/
|-- README.md
|-- LICENSE
|-- requirements.txt
|-- .gitignore
|-- .env.example
|-- assets/
|-- data/
| |-- raw/
| `-- processed/
|-- models/
|-- notebooks/
|-- scripts/
|-- sql/
|-- src/
|-- app/
| `-- pages/
`-- tests/
- Dataset page:
- Direct zip used in this workspace:
- Expected local file:
data/raw/household_power.txt
- Public docs:
- This project can use:
- direct API access with
EIA_API_KEY, or - the official bulk operating-data fallback
- direct API access with
- Meteostat first
- Open-Meteo fallback when Meteostat is unavailable
This repo is intentionally set up like a real project, not a “dump everything into Git” repo.
Committed:
- source code
- tests
- SQL
- notebooks
- README visuals
- CI workflow
Not committed:
- raw datasets
- large processed artifacts
- local DuckDB database
- trained model binaries
.envsecrets
This keeps the public repo lightweight and safe while still making the project understandable.
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt- Place the UCI file at
data/raw/household_power.txt - Optionally copy
.env.exampleto.envif you want direct EIA API calls - If no EIA key is present, ingestion falls back to the EIA bulk operating-data path
One-command helper:
.\scripts\run_pipeline.ps1 -GenerateReadmeAssetsManual steps:
.venv\Scripts\python -m src.ingest --uci-path data/raw/household_power.txt --region PJM
.venv\Scripts\python -m src.preprocess
.venv\Scripts\python -m src.anomaly_detector --train
.venv\Scripts\python -m src.demand_forecaster --train --horizon-hours 720
.venv\Scripts\python -m src.simulate_households
.venv\Scripts\python -m src.anomaly_detector --train --input-path data/processed/meter_hourly_portfolio.parquet --output-path data/processed/anomaly_labels_portfolio.parquet --model-path models/isolation_forest_portfolio_v1.pkl --table-name anomaly_scores_portfolio.venv\Scripts\streamlit run app/streamlit_app.py.\scripts\generate_readme_assets.ps1- KPI cards
- 30-day consumption heatmap
- load timeline with anomaly overlay
- event timeline
- sortable anomaly table
- 24-hour context window
- anomaly type distribution
- business impact summary
- simulated household selector when portfolio artifacts exist
- actual vs forecast
- 80% and 95% interval bands
- hour-of-day error chart
- Tree SHAP demand driver chart
- walk-forward backtest trend
- sub-meter stacked area chart
- correlation heatmap
- weekday vs weekend profile
- narrative insights
| Property | Anomaly Detector | Demand Forecaster |
|---|---|---|
| Algorithm | Isolation Forest | XGBoost Regressor |
| Training data | UCI household hourly rollups (2006-2010) | PJM demand + weather (rolling 2-year window) |
| Validation method | Rolling z-score confirmation | TimeSeriesSplit (5 folds, 24h gap) + walk-forward backtest |
| Explainability | Rule-based anomaly taxonomy | Tree SHAP mean absolute contribution |
| Key metric | Confirmed anomaly counts by severity | MAPE: 1.21% on latest trained run |
| Interval logic | Severity rules | Conformal-style residual bands by hour-of-day |
| Retraining cadence | Monthly recommended | Monthly recommended |
| Known limitation | Single-household real baseline | Extreme shock performance depends on available exogenous signals |
Latest local checks:
pytest tests/ -q- full CLI pipeline run through ingest, preprocess, anomaly scoring, forecast training, and portfolio scoring
- Raw datasets and secrets are gitignored
- The README visuals are generated from local pipeline outputs
- The project is ready to publish once you connect a GitHub repository URL



