EcoFire: Weekly Forest Fire Prediction for India

The first weekly, spatial fire prediction system for Indian forests.

India's forest fire apparatus detects 3.5 lakh fires per year. It predicts zero. EcoFire changes that — predicting which 1km grid cells will ignite next week using satellite vegetation indices, weather reanalysis, fire danger indices, terrain, and historical fire patterns.

40,000 km² of forest → 50 km² shortlist. 800x spatial reduction.

Key Results

Evaluated on 26 weeks of 2025 validation data (Karnataka, India). Source: eval.py → data/metrics.json.

Season Band	Weeks	P@50	Fires Intercepted	Concentration vs Base Rate
Overall (W1-26)	26	7.8%	102	13.1x
Fire season (W1-20)	20	9.8%	98	12.7x
Peak (W5-17, Feb-Apr)	13	10.3%	67	9.9x
Hot peak (W6-12, Feb-Mar)	7	16.9%	59	10.8x
Monsoon (W21+)	6	1.3%	4	—

Best single week: W10 = 38% P@50 (19 of 50 flagged cells had fires, out of 1,141 total fires)
Naive baseline (historical fire frequency only): 7.5% overall — model adds 4-34% relative lift depending on season band
756 experiments across 6 sweeps confirmed the ceiling with offline data

What the Numbers Mean

Precision@50 = of the top 50 cells flagged per week, how many actually burned? This is the operationally relevant metric — a forest division can realistically patrol ~50 km² per week.

Concentration = fire probability in the 50 flagged cells vs the forest-wide base rate. At 10.8x during hot peak, the model concentrates fire risk into 0.125% of the search space.

800x spatial reduction = 40,000 km² of forest → 50 km² shortlist (arithmetic, always true regardless of model quality).

Why This Matters

Current State (India)	With EcoFire
Zero fire prediction	Weekly 1km predictions
FSI FAST alerts detect fires after ignition	Predictions before ignition
Rangers patrol 40,000 km² blind	50 km² shortlist per week
FWI pilot (2019) stalled at 2 regions	Statewide coverage, extensible

India's fire management stack — FSI FAST v1→v3, fire prone mapping, FWI pilot — is entirely reactive or static. EcoFire is the first system that produces dynamic, weekly, spatial fire predictions at 1km resolution. See docs/09-MOEFCC-FIRE-INTELLIGENCE.md for the full gap analysis.

Quick Start

# Clone
git clone https://github.com/nikhilvelpanur/ecofire.git
cd ecofire

# Install dependencies
pip install -r requirements.txt

# Evaluate with saved model
python eval.py --model best

# Retrain best config and evaluate
python eval.py

# Run full sweep on Modal GPU (requires Modal account)
modal run -d modal_app.py::sweep

Data Preparation (from scratch)

The processed datasets (data/*.parquet) are too large for GitHub (~3.4GB). To rebuild from raw sources:

# 1. Download raw data (requires API keys — see Data Sources below)
python download_firms_batch.py   # NASA FIRMS fire detections
python download_ndvi.py          # Sentinel-2 NDVI (requires GEE service account)
python download_srtm.py          # SRTM elevation
python download_worldcover.py    # ESA WorldCover
python download_era5land.py      # ERA5-Land weather (requires CDS API key)
python download_smap.py          # SMAP soil moisture (requires GEE)
python download_fwi.py           # CEMS FWI fire danger (requires EWDS API key)
python download_ndvi_weekly.py   # Weekly Sentinel-2 composites (requires GEE)

# 2. Build grid + join all features
python prepare.py

# 3. Add Phase 3 features (ERA5-Land, SMAP, FWI)
python rebuild_phase3.py

# 4. Add weekly NDVI/LSWI features
python rebuild_weekly_ndvi.py

# 5. Evaluate
python eval.py

Architecture

Model

XGBoost with binary:logistic objective, trained on all 40,000 cells per week. The model ranks cells by predicted fire probability; the top 50 per week are flagged.

Key hyperparameters (best config):

max_depth=7, learning_rate=0.05, subsample=0.7
min_child_weight=50, gamma=1.0, lambda=5.0
Negative sampling: 10% of non-fire cells retained during training
Early stopping on validation AUCPR

Features (46 used, 52 available)

Category	Features	Importance
Fire history	`fire_freq`, `fire_history_1yr`, `fire_neighbor_1w/2w`, `fire_radius_5km_2w`	Dominant (fire_freq = #1)
Fire danger indices	`fwi_mean/max`, `ffmc_mean`, `dmc_mean`, `dc_mean`, `isi_mean`, `bui_mean`	High (fwi_mean = #2, bui_mean = #3)
Weather	`vpd_mean`, `temp_max_c`, `temp_mean_c`, `rh_mean`, `wind_speed_mean`, `precip_sum_mm`, `days_no_rain`, `precip_cumul_30d`	Moderate
Vegetation	`ndvi_mean`, `ndvi_diff_4w`, `ndvi_weekly`, `ndvi_diff_1w`, `lswi`, `lswi_diff_*`, `forest_fraction`	Low (redundant with FWI)
Terrain	`elevation_m`, `slope_deg`, `lat`, `lon`	Low-moderate
Soil moisture	`swvl1-4`, `stl1`, `smap_sm_mean/min`, `lai_hv/lv`	Low
Human geography	`dist_to_settlement/town_km`, `pop_within_5km`, `n_settlements_5km`, `nightlight_mean`	Negligible
Temporal	`week_sin`, `week_cos`	Moderate

What We Learned (756 experiments)

What works: Historical fire frequency + fire danger indices (FWI/BUI) + basic weather. These have been the top 3 features since Sweep 1. No new data source has displaced them.

What doesn't work (confirmed):

Higher-resolution weather (ERA5-Land 9km vs ERA5 30km) — redundant
Satellite soil moisture (SMAP) — redundant with days_no_rain
Weekly vegetation (Sentinel-2 NDVI/LSWI) — redundant with FWI
Human geography (SHRUG settlements, population, nightlights) — zero lift
Model architecture changes (LightGBM, ranking objectives, ensembles) — no improvement
All-cell classification — collapses; only ranking works at 40K-cell scale

Root cause of the ceiling: The model learns where conditions allow fire, but not where someone will light one. Features predict fire weather and vegetation dryness (the "conditions" axis), but ignition in Indian forests is overwhelmingly anthropogenic and essentially random at 1km/1week resolution.

Data Sources

All data is freely available from public sources:

Source	What	Resolution	API/Access
NASA FIRMS	Fire detections (ground truth)	Point	Free API, CSV download
ERA5-Land	Weather + soil + LAI	9km	CDS API (free registration)
ERA5	Weather reanalysis	30km	CDS API
CEMS FWI	Fire Weather Index components	8km	EWDS API (free registration)
SMAP	Surface soil moisture	9km	Google Earth Engine
Sentinel-2	NDVI + LSWI	10m→1km	Google Earth Engine
SRTM GL1	Elevation + slope	30m	GEE or direct download
ESA WorldCover	Land cover / forest fraction	10m	Direct download
SHRUG	Settlements, population	Village	Registration required

Grid

Karnataka state divided into 40,000 1km grid cells (from 113K total cells, subsampled to 23K fire-history + 17K non-fire cells). Grid stored in grid/karnataka_grid.parquet.

Train/Val/Test Split

Split	Period	Rows	Purpose
Train	2020 W1 – 2024 W52	~10.4M	Model training
Val	2025 W1 – W26	~1.04M	Hyperparameter tuning, metric reporting
Test	2025 W27 – W52	~1.04M	Held out (monsoon-heavy, lower signal)

Experiment History

756 experiments across 6 sweeps. Full details in docs/04-EXPERIMENT-LOG.md.

Sweep	Experiments	Best P@50	Key Finding
1: Baseline	145	9.00%	Established ceiling with base features
2: SHRUG + contagion	155	9.00%	Human geography features = zero lift
3: Architectures	158	9.08%	LightGBM, ranking objectives — model not the bottleneck
4: Phase 3 data	162	9.23%	ERA5-Land + SMAP + FWI — redundant with existing weather
5: All-cell scoring	68	9.00%	Removed candidate filter — classification collapses at scale
6: Weekly NDVI+LSWI	68	8.54%	Weekly vegetation signal redundant with FWI

Methodology

This project follows the autoresearch pattern (Karpathy): prepare.py is fixed (data pipeline), while train.py is iterated by an AI research agent guided by program.md. The agent runs experiments autonomously, evaluating against Precision@50 on the validation set.

Evaluation Pipeline

eval.py is the single source of truth for all metrics. It retrains the best configuration (or loads a saved model), computes per-week P@50 for every validation week, aggregates into season bands, and outputs data/metrics.json.

python eval.py                  # Retrain + evaluate
python eval.py --model best     # Load saved model + evaluate
python eval.py --split test     # Evaluate on test set

This was built after a lesson learned: ad-hoc metric calculations during long research sessions produced inflated numbers that propagated to documentation. All claims now trace back to metrics.json.

Project Structure

ecofire/
├── README.md                    # This file
├── eval.py                      # Evaluation pipeline (source of truth)
├── train.py                     # Model training + sweep (iterated by AI)
├── prepare.py                   # Data pipeline: raw → features → parquet
├── modal_app.py                 # Modal serverless GPU runner
├── program.md                   # Autoresearch agent instructions
├── requirements.txt             # Python dependencies
│
├── download_firms_batch.py      # NASA FIRMS fire detections
├── download_ndvi.py             # Monthly Sentinel-2 NDVI (GEE)
├── download_ndvi_weekly.py      # Weekly Sentinel-2 NDVI + LSWI (GEE)
├── download_era5land.py         # ERA5-Land weather (CDS API)
├── download_smap.py             # SMAP soil moisture (GEE)
├── download_fwi.py              # CEMS FWI fire danger (EWDS API)
├── download_srtm.py             # SRTM elevation
├── download_worldcover.py       # ESA WorldCover land use
├── download_shrug.py            # SHRUG settlement data
│
├── rebuild_features.py          # Feature engineering (v1-v7 variants)
├── rebuild_phase3.py            # ERA5-Land + SMAP + FWI integration
├── rebuild_weekly_ndvi.py       # Weekly NDVI/LSWI integration
│
├── data/
│   ├── metrics.json             # Authoritative evaluation metrics
│   ├── best_model.json          # Saved XGBoost model
│   ├── feature_stats.json       # Feature normalization statistics
│   ├── train.parquet            # Training data (not in repo — too large)
│   ├── val.parquet              # Validation data (not in repo)
│   └── test.parquet             # Test data (not in repo)
│
├── grid/
│   └── karnataka_grid.parquet   # 40K cell grid with terrain features
│
├── baselines/
│   └── baselines.json           # Pre-computed baseline results
│
├── docs/
│   ├── 01-PROJECT-OVERVIEW.md   # Motivation, timeline, infrastructure
│   ├── 02-DATA-PIPELINE.md      # All data sources, grid, splits, features
│   ├── 03-ARCHITECTURE.md       # Model evolution, what didn't work
│   ├── 04-EXPERIMENT-LOG.md     # All 756 experiments across 6 sweeps
│   ├── 05-DEPLOYMENT-ROADMAP.md # Gap analysis, deployment architecture
│   ├── 06-IMPROVEMENT-PLAN.md   # 5-phase improvement plan
│   ├── 07-LANDSCAPE-STUDY.md    # 33 papers/systems literature review
│   ├── 08-COMMERCIAL-DATA-SOURCES.md  # Free vs paid data analysis
│   ├── 09-MOEFCC-FIRE-INTELLIGENCE.md # India fire management gaps
│   ├── 10-ONLINE-LEARNING-ROADMAP.md  # Path to 30%+ via deployment
│   └── 11-ADJACENT-OPPORTUNITIES.md   # Platform extension opportunities
│
├── sweep_results.txt            # Sweep 1 output
├── sweep_results_v2.txt         # Sweep 2 output
└── sweep_results_v3.txt         # Sweep 3 output

Landscape Context

A literature review of 33 papers and systems found:

No weekly fire prediction system exists for India. All Indian studies do static susceptibility mapping.
ECMWF PoF (Probability of Fire) is the global state-of-the-art — 1km, 10-day forecast. Our work validates XGBoost as competitive with their approach for regional deployment.
XGBoost matches or beats deep learning for tabular fire prediction (AUC 0.83-0.96 across published studies).
The biggest gap in Indian fire research is temporal/lagged features — exactly what EcoFire addresses.

Path Forward

The 9% ceiling with offline data is confirmed. The path to 30%+ requires deployment + online learning:

Ship & observe — deploy model, log predictions, auto-label via FIRMS
Online recalibration — monthly retrain with fresh FIRMS labels
Active learning — Thompson sampling to explore under-predicted regions
Causal/counterfactual — solve the prevention paradox (successful prevention removes positive labels)
Multi-state transfer — extend beyond Karnataka for more training data diversity

See docs/10-ONLINE-LEARNING-ROADMAP.md for the full roadmap.

Citation

If you use this work, please cite:

@software{ecofire2026,
  author = {Velpanur, Nikhil},
  title = {EcoFire: Weekly Forest Fire Prediction for India},
  year = {2026},
  url = {https://github.com/nikhilvelpanur/ecofire}
}

License

Apache 2.0. See LICENSE.

Acknowledgments

Built by Emergent Narrative as part of the Ecological DPI initiative. This work was conducted using the autoresearch methodology with Claude (Anthropic) as the AI research agent.

Data sources: NASA FIRMS, Copernicus Climate Data Store (ERA5, ERA5-Land, CEMS FWI), Google Earth Engine (Sentinel-2, SMAP), USGS (SRTM), ESA (WorldCover), Development Data Lab (SHRUG).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EcoFire: Weekly Forest Fire Prediction for India

Key Results

What the Numbers Mean

Why This Matters

Quick Start

Data Preparation (from scratch)

Architecture

Model

Features (46 used, 52 available)

What We Learned (756 experiments)

Data Sources

Grid

Train/Val/Test Split

Experiment History

Methodology

Evaluation Pipeline

Project Structure

Landscape Context

Path Forward

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
baselines		baselines
data		data
docs		docs
grid		grid
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_era5land.py		download_era5land.py
download_firms_batch.py		download_firms_batch.py
download_fwi.py		download_fwi.py
download_ndvi.py		download_ndvi.py
download_ndvi_weekly.py		download_ndvi_weekly.py
download_shrug.py		download_shrug.py
download_smap.py		download_smap.py
download_srtm.py		download_srtm.py
download_worldcover.py		download_worldcover.py
eval.py		eval.py
modal_app.py		modal_app.py
prepare.py		prepare.py
program.md		program.md
rebuild_features.py		rebuild_features.py
rebuild_phase3.py		rebuild_phase3.py
rebuild_weekly_ndvi.py		rebuild_weekly_ndvi.py
requirements.txt		requirements.txt
sweep_results.txt		sweep_results.txt
sweep_results_v2.txt		sweep_results_v2.txt
sweep_results_v3.txt		sweep_results_v3.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

EcoFire: Weekly Forest Fire Prediction for India

Key Results

What the Numbers Mean

Why This Matters

Quick Start

Data Preparation (from scratch)

Architecture

Model

Features (46 used, 52 available)

What We Learned (756 experiments)

Data Sources

Grid

Train/Val/Test Split

Experiment History

Methodology

Evaluation Pipeline

Project Structure

Landscape Context

Path Forward

Citation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages