Skip to content

nikhilvelpanur/ecofire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EcoFire: Weekly Forest Fire Prediction for India

The first weekly, spatial fire prediction system for Indian forests.

India's forest fire apparatus detects 3.5 lakh fires per year. It predicts zero. EcoFire changes that — predicting which 1km grid cells will ignite next week using satellite vegetation indices, weather reanalysis, fire danger indices, terrain, and historical fire patterns.

EcoFire concept
40,000 km² of forest → 50 km² shortlist. 800x spatial reduction.

Key Results

Evaluated on 26 weeks of 2025 validation data (Karnataka, India). Source: eval.pydata/metrics.json.

Season Band Weeks P@50 Fires Intercepted Concentration vs Base Rate
Overall (W1-26) 26 7.8% 102 13.1x
Fire season (W1-20) 20 9.8% 98 12.7x
Peak (W5-17, Feb-Apr) 13 10.3% 67 9.9x
Hot peak (W6-12, Feb-Mar) 7 16.9% 59 10.8x
Monsoon (W21+) 6 1.3% 4
  • Best single week: W10 = 38% P@50 (19 of 50 flagged cells had fires, out of 1,141 total fires)
  • Naive baseline (historical fire frequency only): 7.5% overall — model adds 4-34% relative lift depending on season band
  • 756 experiments across 6 sweeps confirmed the ceiling with offline data

What the Numbers Mean

Precision@50 = of the top 50 cells flagged per week, how many actually burned? This is the operationally relevant metric — a forest division can realistically patrol ~50 km² per week.

Concentration = fire probability in the 50 flagged cells vs the forest-wide base rate. At 10.8x during hot peak, the model concentrates fire risk into 0.125% of the search space.

800x spatial reduction = 40,000 km² of forest → 50 km² shortlist (arithmetic, always true regardless of model quality).

Why This Matters

Current State (India) With EcoFire
Zero fire prediction Weekly 1km predictions
FSI FAST alerts detect fires after ignition Predictions before ignition
Rangers patrol 40,000 km² blind 50 km² shortlist per week
FWI pilot (2019) stalled at 2 regions Statewide coverage, extensible

India's fire management stack — FSI FAST v1→v3, fire prone mapping, FWI pilot — is entirely reactive or static. EcoFire is the first system that produces dynamic, weekly, spatial fire predictions at 1km resolution. See docs/09-MOEFCC-FIRE-INTELLIGENCE.md for the full gap analysis.

Quick Start

# Clone
git clone https://github.com/nikhilvelpanur/ecofire.git
cd ecofire

# Install dependencies
pip install -r requirements.txt

# Evaluate with saved model
python eval.py --model best

# Retrain best config and evaluate
python eval.py

# Run full sweep on Modal GPU (requires Modal account)
modal run -d modal_app.py::sweep

Data Preparation (from scratch)

The processed datasets (data/*.parquet) are too large for GitHub (~3.4GB). To rebuild from raw sources:

# 1. Download raw data (requires API keys — see Data Sources below)
python download_firms_batch.py   # NASA FIRMS fire detections
python download_ndvi.py          # Sentinel-2 NDVI (requires GEE service account)
python download_srtm.py          # SRTM elevation
python download_worldcover.py    # ESA WorldCover
python download_era5land.py      # ERA5-Land weather (requires CDS API key)
python download_smap.py          # SMAP soil moisture (requires GEE)
python download_fwi.py           # CEMS FWI fire danger (requires EWDS API key)
python download_ndvi_weekly.py   # Weekly Sentinel-2 composites (requires GEE)

# 2. Build grid + join all features
python prepare.py

# 3. Add Phase 3 features (ERA5-Land, SMAP, FWI)
python rebuild_phase3.py

# 4. Add weekly NDVI/LSWI features
python rebuild_weekly_ndvi.py

# 5. Evaluate
python eval.py

Architecture

Model

XGBoost with binary:logistic objective, trained on all 40,000 cells per week. The model ranks cells by predicted fire probability; the top 50 per week are flagged.

Key hyperparameters (best config):

  • max_depth=7, learning_rate=0.05, subsample=0.7
  • min_child_weight=50, gamma=1.0, lambda=5.0
  • Negative sampling: 10% of non-fire cells retained during training
  • Early stopping on validation AUCPR

Features (46 used, 52 available)

Category Features Importance
Fire history fire_freq, fire_history_1yr, fire_neighbor_1w/2w, fire_radius_5km_2w Dominant (fire_freq = #1)
Fire danger indices fwi_mean/max, ffmc_mean, dmc_mean, dc_mean, isi_mean, bui_mean High (fwi_mean = #2, bui_mean = #3)
Weather vpd_mean, temp_max_c, temp_mean_c, rh_mean, wind_speed_mean, precip_sum_mm, days_no_rain, precip_cumul_30d Moderate
Vegetation ndvi_mean, ndvi_diff_4w, ndvi_weekly, ndvi_diff_1w, lswi, lswi_diff_*, forest_fraction Low (redundant with FWI)
Terrain elevation_m, slope_deg, lat, lon Low-moderate
Soil moisture swvl1-4, stl1, smap_sm_mean/min, lai_hv/lv Low
Human geography dist_to_settlement/town_km, pop_within_5km, n_settlements_5km, nightlight_mean Negligible
Temporal week_sin, week_cos Moderate

What We Learned (756 experiments)

What works: Historical fire frequency + fire danger indices (FWI/BUI) + basic weather. These have been the top 3 features since Sweep 1. No new data source has displaced them.

What doesn't work (confirmed):

  • Higher-resolution weather (ERA5-Land 9km vs ERA5 30km) — redundant
  • Satellite soil moisture (SMAP) — redundant with days_no_rain
  • Weekly vegetation (Sentinel-2 NDVI/LSWI) — redundant with FWI
  • Human geography (SHRUG settlements, population, nightlights) — zero lift
  • Model architecture changes (LightGBM, ranking objectives, ensembles) — no improvement
  • All-cell classification — collapses; only ranking works at 40K-cell scale

Root cause of the ceiling: The model learns where conditions allow fire, but not where someone will light one. Features predict fire weather and vegetation dryness (the "conditions" axis), but ignition in Indian forests is overwhelmingly anthropogenic and essentially random at 1km/1week resolution.

Data Sources

All data is freely available from public sources:

Source What Resolution API/Access
NASA FIRMS Fire detections (ground truth) Point Free API, CSV download
ERA5-Land Weather + soil + LAI 9km CDS API (free registration)
ERA5 Weather reanalysis 30km CDS API
CEMS FWI Fire Weather Index components 8km EWDS API (free registration)
SMAP Surface soil moisture 9km Google Earth Engine
Sentinel-2 NDVI + LSWI 10m→1km Google Earth Engine
SRTM GL1 Elevation + slope 30m GEE or direct download
ESA WorldCover Land cover / forest fraction 10m Direct download
SHRUG Settlements, population Village Registration required

Grid

Karnataka state divided into 40,000 1km grid cells (from 113K total cells, subsampled to 23K fire-history + 17K non-fire cells). Grid stored in grid/karnataka_grid.parquet.

Train/Val/Test Split

Split Period Rows Purpose
Train 2020 W1 – 2024 W52 ~10.4M Model training
Val 2025 W1 – W26 ~1.04M Hyperparameter tuning, metric reporting
Test 2025 W27 – W52 ~1.04M Held out (monsoon-heavy, lower signal)

Experiment History

756 experiments across 6 sweeps. Full details in docs/04-EXPERIMENT-LOG.md.

Sweep Experiments Best P@50 Key Finding
1: Baseline 145 9.00% Established ceiling with base features
2: SHRUG + contagion 155 9.00% Human geography features = zero lift
3: Architectures 158 9.08% LightGBM, ranking objectives — model not the bottleneck
4: Phase 3 data 162 9.23% ERA5-Land + SMAP + FWI — redundant with existing weather
5: All-cell scoring 68 9.00% Removed candidate filter — classification collapses at scale
6: Weekly NDVI+LSWI 68 8.54% Weekly vegetation signal redundant with FWI

Methodology

This project follows the autoresearch pattern (Karpathy): prepare.py is fixed (data pipeline), while train.py is iterated by an AI research agent guided by program.md. The agent runs experiments autonomously, evaluating against Precision@50 on the validation set.

Evaluation Pipeline

eval.py is the single source of truth for all metrics. It retrains the best configuration (or loads a saved model), computes per-week P@50 for every validation week, aggregates into season bands, and outputs data/metrics.json.

python eval.py                  # Retrain + evaluate
python eval.py --model best     # Load saved model + evaluate
python eval.py --split test     # Evaluate on test set

This was built after a lesson learned: ad-hoc metric calculations during long research sessions produced inflated numbers that propagated to documentation. All claims now trace back to metrics.json.

Project Structure

ecofire/
├── README.md                    # This file
├── eval.py                      # Evaluation pipeline (source of truth)
├── train.py                     # Model training + sweep (iterated by AI)
├── prepare.py                   # Data pipeline: raw → features → parquet
├── modal_app.py                 # Modal serverless GPU runner
├── program.md                   # Autoresearch agent instructions
├── requirements.txt             # Python dependencies
│
├── download_firms_batch.py      # NASA FIRMS fire detections
├── download_ndvi.py             # Monthly Sentinel-2 NDVI (GEE)
├── download_ndvi_weekly.py      # Weekly Sentinel-2 NDVI + LSWI (GEE)
├── download_era5land.py         # ERA5-Land weather (CDS API)
├── download_smap.py             # SMAP soil moisture (GEE)
├── download_fwi.py              # CEMS FWI fire danger (EWDS API)
├── download_srtm.py             # SRTM elevation
├── download_worldcover.py       # ESA WorldCover land use
├── download_shrug.py            # SHRUG settlement data
│
├── rebuild_features.py          # Feature engineering (v1-v7 variants)
├── rebuild_phase3.py            # ERA5-Land + SMAP + FWI integration
├── rebuild_weekly_ndvi.py       # Weekly NDVI/LSWI integration
│
├── data/
│   ├── metrics.json             # Authoritative evaluation metrics
│   ├── best_model.json          # Saved XGBoost model
│   ├── feature_stats.json       # Feature normalization statistics
│   ├── train.parquet            # Training data (not in repo — too large)
│   ├── val.parquet              # Validation data (not in repo)
│   └── test.parquet             # Test data (not in repo)
│
├── grid/
│   └── karnataka_grid.parquet   # 40K cell grid with terrain features
│
├── baselines/
│   └── baselines.json           # Pre-computed baseline results
│
├── docs/
│   ├── 01-PROJECT-OVERVIEW.md   # Motivation, timeline, infrastructure
│   ├── 02-DATA-PIPELINE.md      # All data sources, grid, splits, features
│   ├── 03-ARCHITECTURE.md       # Model evolution, what didn't work
│   ├── 04-EXPERIMENT-LOG.md     # All 756 experiments across 6 sweeps
│   ├── 05-DEPLOYMENT-ROADMAP.md # Gap analysis, deployment architecture
│   ├── 06-IMPROVEMENT-PLAN.md   # 5-phase improvement plan
│   ├── 07-LANDSCAPE-STUDY.md    # 33 papers/systems literature review
│   ├── 08-COMMERCIAL-DATA-SOURCES.md  # Free vs paid data analysis
│   ├── 09-MOEFCC-FIRE-INTELLIGENCE.md # India fire management gaps
│   ├── 10-ONLINE-LEARNING-ROADMAP.md  # Path to 30%+ via deployment
│   └── 11-ADJACENT-OPPORTUNITIES.md   # Platform extension opportunities
│
├── sweep_results.txt            # Sweep 1 output
├── sweep_results_v2.txt         # Sweep 2 output
└── sweep_results_v3.txt         # Sweep 3 output

Landscape Context

A literature review of 33 papers and systems found:

  • No weekly fire prediction system exists for India. All Indian studies do static susceptibility mapping.
  • ECMWF PoF (Probability of Fire) is the global state-of-the-art — 1km, 10-day forecast. Our work validates XGBoost as competitive with their approach for regional deployment.
  • XGBoost matches or beats deep learning for tabular fire prediction (AUC 0.83-0.96 across published studies).
  • The biggest gap in Indian fire research is temporal/lagged features — exactly what EcoFire addresses.

Path Forward

The 9% ceiling with offline data is confirmed. The path to 30%+ requires deployment + online learning:

  1. Ship & observe — deploy model, log predictions, auto-label via FIRMS
  2. Online recalibration — monthly retrain with fresh FIRMS labels
  3. Active learning — Thompson sampling to explore under-predicted regions
  4. Causal/counterfactual — solve the prevention paradox (successful prevention removes positive labels)
  5. Multi-state transfer — extend beyond Karnataka for more training data diversity

See docs/10-ONLINE-LEARNING-ROADMAP.md for the full roadmap.

Citation

If you use this work, please cite:

@software{ecofire2026,
  author = {Velpanur, Nikhil},
  title = {EcoFire: Weekly Forest Fire Prediction for India},
  year = {2026},
  url = {https://github.com/nikhilvelpanur/ecofire}
}

License

Apache 2.0. See LICENSE.

Acknowledgments

Built by Emergent Narrative as part of the Ecological DPI initiative. This work was conducted using the autoresearch methodology with Claude (Anthropic) as the AI research agent.

Data sources: NASA FIRMS, Copernicus Climate Data Store (ERA5, ERA5-Land, CEMS FWI), Google Earth Engine (Sentinel-2, SMAP), USGS (SRTM), ESA (WorldCover), Development Data Lab (SHRUG).

About

India's first weekly forest fire prediction system. 1km resolution, 8 satellite data sources, 756 experiments. Built for Karnataka, extensible to all states.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages