Engineering·3 May 2026·10 min read

How accurate is EnergyMap's LightGBM demand model?

We run a per-state LightGBM nowcast for the 11 Indian states with mature realtime SLDC feeds. This is the verified per-state MAPE, R², and bias against actuals — including the states where the model isn't yet production-ready.

EnergyMap Research Team

India Energy Atlas · CIFR

Illustration · India Energy Atlas Research

What this model is for

27 of 36 Indian states and UTs do not publish realtime electricity demand to a stable public API. For grid-wide nowcasting and analytics, we need a uniform 15-minute demand series across every state — even where there is no live SLDC feed. We model demand for those states.

For the 11 states that do publish a usable realtime feed (Andhra Pradesh, Delhi, Gujarat, Himachal Pradesh, Karnataka, Kerala, Punjab, Rajasthan, Tamil Nadu, Telangana, West Bengal), we still maintain a parallel modelled curve. That sounds redundant — until you join across states. Each SLDC publishes at a different cadence, drops samples, and has its own timezone quirks. The modelled layer guarantees a gap-free 96-block grid every day, every state. The actuals stay the canonical source of truth where they exist; the model is the safety net.

The current production model is a per-state LightGBM gradient boosting regressor with the source label modeled_ml_v1. There is also a 7-day-ahead recursive forecast variant called modeled_ml_forecast_v1 using the same boosters. Code lives in scripts/train_demand_model.py (training, weekly cron) and scripts/run_demand_forecast.py (inference, every 15 minutes).

What the LightGBM trains on

One booster per state. Same feature set across states, but each state learns its own response.

Calendar

hour, day_of_week, day_of_year, month
is_weekend, is_holiday (India national + per-state regional via the holidays Python lib — AP gets the regional Telugu calendar, Punjab gets Sikh festivals)
Cyclic encodings: hour_sin/cos, dow_sin/cos, doy_sin/cos so the gradient-boosted tree can split on smooth wrap-around features

Pure structural features the model gets for free.

Demand context

all_india_demand_mw (the NPP MERIT national curve at the same timestamp)
demand_lag_24h (own-state demand 24 h ago)
demand_lag_7d (own-state demand 7 days ago)

all_india_demand_mw is empirically the highest-importance feature — it captures 60–80% of the variance before any state-specific signal is added.

Weather (Open-Meteo, hourly)

temperature_c at the state capital
humidity_pct at the state capital
precipitation_mm at the state capital

Joined via merge_asof with a 30-min tolerance. Captures the AC-load spike that calendar + national demand alone cannot.

Targets are state demand_mw rows from the official SLDC scrape. Train/test split is rolling-time (no leakage). Validation is per-state with seasonal cross-validation on month boundaries.

Headline numbers

We pair every 15-min model prediction with the closest actual SLDC reading within ±5 minutes, drop predictions without a paired actual, then aggregate. Last 30 days, all hours.

Per-state MAPE for the LightGBM demand nowcast over the last 30 days — Per-state MAPE across the deployed LightGBM nowcast — last 30 days. Green ≤5%, amber 5–10%, red >10% or insufficient samples. AP, Gujarat, Delhi, and Punjab cluster at production-grade accuracy.

Per-state nowcast accuracy · last 30 days · sorted best-first

State	Samples	MAPE	RMSE	Bias	R²	Typical demand
Andhra Pradesh	568	2.51%	374 MW	-94 MW	0.907	10.8 GW
Gujarat	942	3.77%	1001 MW	-444 MW	0.836	19.6 GW
Delhi	509	5.79%	371 MW	-278 MW	0.866	5.2 GW
Punjab	828	6.89%	581 MW	+7 MW	0.782	6.5 GW

Four states cluster in the production-quality range — single-digit MAPE, R² above 0.78, and bias that's small relative to absolute load. Andhra Pradesh at 2.5% MAPE on a 10.8 GW typical load with R² = 0.91 is the best-performing state. Punjab is the most calibrated, with a mean bias of just +7 MW on a ~6.5 GW load — the model is essentially unbiased on average.

Andhra Pradesh deep dive — the 2.5% MAPE state

A single day, three layers — actual SLDC reading, the LightGBM nowcast for the same instant, and the LightGBM's 7-day-ahead forecast issued before the day started:

Andhra Pradesh electricity demand on 2026-04-25 UTC: actual SLDC (solid), LightGBM nowcast (dashed), and LightGBM forecast (dotted) — AP demand on 2026-04-25 UTC. The dashed orange line is the LightGBM nowcast (modeled_ml_v1); the dotted green line is the same booster running 7 days ahead recursively (modeled_ml_forecast_v1). The morning peak is where the model is mildly conservative — it doesn't see today's heat early enough.

Daily MAPE for AP, last 21 days — this is the stability story we care about. A model that's 2% one day and 12% the next isn't useful no matter what its mean is.

Daily MAPE for the Andhra Pradesh LightGBM nowcast over the last 21 days — AP daily MAPE clusters tightly between 1.5% and 4%. The 5% target line is the bar we want to consistently stay under. Single-day spikes are typically holiday days the calendar feature didn't catch (festivals with state-specific dates).

Predicted vs actual scatter, coloured by hour-of-day — shows where the model tracks the diagonal and where it diverges:

Predicted vs actual demand for Andhra Pradesh, scatter plot coloured by hour of day — R² = 0.91 means the booster captures 91% of the variance in AP demand. The horizontal banding around 12,500 MW is the morning peak — the model has a soft ceiling there, which is why daytime MAPE is slightly worse than night.

Where it misses

MAPE broken out by hour of day for AP — this answers “when should I trust the model and when should I cross-check?”:

MAPE by hour of day for the Andhra Pradesh LightGBM nowcast — MAPE is lowest from 21:00 to 11:00 IST (~1.5–2%) and rises during the afternoon air-conditioning peak (15:00 IST hits ~4.7%). The model can't fully see today's heat ramp because weather is hourly and lag features are 24h/7d old; this gap is exactly where adding nowcast weather (every 15 min instead of every 60) would help most.

Residual distribution — predicted minus actual:

Residual distribution histogram (predicted minus actual demand) for Andhra Pradesh — Residuals are roughly Gaussian with a mild negative bias (mean −94 MW on a ~10.8 GW load, ~−0.9%). The negative skew is structural: lag features can't see “today's surprise” — heatwaves, regional outages, or unusual industrial offtake — and so the model under-predicts on high-stress days.

States not yet production-ready

Two of the six states where the booster is currently emitting rows are flagged as not production-ready. We're keeping them visible — the data is there if you want to inspect it — but we don't advertise them as a shipped product. The other five OFFICIAL_FEED_STATES (Karnataka, Rajasthan, Tamil Nadu, Telangana, West Bengal) have boosters trained but not yet deployed to production inference.

States flagged as not yet production-ready

State	Samples	MAPE	Why we hold it back
Himachal Pradesh	965	220.8%	Calibration error — small absolute load amplifies any over-prediction. Booster needs per-state target normalisation.
Kerala	11	24.2%	Insufficient paired observations — upstream feed publishes daily-summary, not 15-min

Himachal Pradesh in particular is interesting: the booster trained, the inference pipeline runs, but the absolute load is so small (~310 MW peak) that any over-prediction looks catastrophic in percentage terms. The fix is per-state target normalisation, queued as part of the next training run.

Reproduce these numbers

The full evaluation script is scripts/eval_demand_models.py in the atlas backend repo. With read-only DB credentials and Python ≥ 3.11:

pip install psycopg[binary] sqlalchemy matplotlib pandas numpy
export DATABASE_URL='postgresql://USER:PASS@HOST:25060/grid?sslmode=require'
python scripts/eval_demand_models.py --out eval-output/ --days 30

That writes metrics.json, metrics.csv, and the six PNG plots embedded above. The CSV is downloadable here for the snapshot shown on this page.

Limitations & open work

Historical depth. Most clean SLDC data starts September 2025. The booster has seen one summer and one winter. Expect material seasonal-bias correction once we have a full annual cycle plus a year of holdout for validation.
No real-time signal in the model. demand_lag_24h and demand_lag_7d are observations from the past. Anything unusual today (heatwave, regional outage, tariff change) only enters via weather and the calendar — not directly. The model under-predicts on high-stress days and we have an open ticket to add a 1-hour lag with proper feature freshness handling.
Hourly weather resolution. Open-Meteo gives 1-hour temperature; AP morning ramp can swing 500+ MW within a single hour. We're effectively low-pass filtering temperature. A 15-min weather feed would close most of the daytime MAPE gap.
No cross-state spillover signal. Each state's model knows All-India demand and its own lags, but not what's happening live in neighbouring states. Pooled multi-state models on the next iteration.
5 of 11 trained states not yet emitting predictions. Karnataka, Rajasthan, Tamil Nadu, Telangana, and West Bengal have trained boosters but the inference cron is not yet writing rows for them. Tracked internally; expect them in production by the end of the next sprint.

Computed from 3,823 matched 15-minute samples over the last 30 days. Snapshot generated 3 May 2026. Code: scripts/eval_demand_models.py.

Filed under

Engineering · Published 3 May 2026

← More from The Atlas Journal