Public releaseYou are viewing an early public release. Original launch date: 18 May 2026. Please email hello@energymap.in for questions, issues, or feature requests.
SSCAtlas
← The Atlas Journal
Engineering·3 May 2026·10 min read

How accurate is EnergyMap's LightGBM demand model?

We run a per-state LightGBM nowcast for the 11 Indian states with mature realtime SLDC feeds. This is the verified per-state MAPE, R², and bias against actuals — including the states where the model isn't yet production-ready.

EM
EnergyMap Research Team
India Energy Atlas · CIFR
Illustration · India Energy Atlas Research

What this model is for

27 of 36 Indian states and UTs do not publish realtime electricity demand to a stable public API. For grid-wide nowcasting and analytics, we need a uniform 15-minute demand series across every state — even where there is no live SLDC feed. We model demand for those states.

For the 11 states that do publish a usable realtime feed (Andhra Pradesh, Delhi, Gujarat, Himachal Pradesh, Karnataka, Kerala, Punjab, Rajasthan, Tamil Nadu, Telangana, West Bengal), we still maintain a parallel modelled curve. That sounds redundant — until you join across states. Each SLDC publishes at a different cadence, drops samples, and has its own timezone quirks. The modelled layer guarantees a gap-free 96-block grid every day, every state. The actuals stay the canonical source of truth where they exist; the model is the safety net.

The current production model is a per-state LightGBM gradient boosting regressor with the source label modeled_ml_v1. There is also a 7-day-ahead recursive forecast variant called modeled_ml_forecast_v1 using the same boosters. Code lives in scripts/train_demand_model.py (training, weekly cron) and scripts/run_demand_forecast.py (inference, every 15 minutes).

What the LightGBM trains on

One booster per state. Same feature set across states, but each state learns its own response.

Calendar
  • hour, day_of_week, day_of_year, month
  • is_weekend, is_holiday (India national + per-state regional via the holidays Python lib — AP gets the regional Telugu calendar, Punjab gets Sikh festivals)
  • Cyclic encodings: hour_sin/cos, dow_sin/cos, doy_sin/cos so the gradient-boosted tree can split on smooth wrap-around features

Pure structural features the model gets for free.

Demand context
  • all_india_demand_mw (the NPP MERIT national curve at the same timestamp)
  • demand_lag_24h (own-state demand 24 h ago)
  • demand_lag_7d (own-state demand 7 days ago)

all_india_demand_mw is empirically the highest-importance feature — it captures 60–80% of the variance before any state-specific signal is added.

Weather (Open-Meteo, hourly)
  • temperature_c at the state capital
  • humidity_pct at the state capital
  • precipitation_mm at the state capital

Joined via merge_asof with a 30-min tolerance. Captures the AC-load spike that calendar + national demand alone cannot.

Targets are state demand_mw rows from the official SLDC scrape. Train/test split is rolling-time (no leakage). Validation is per-state with seasonal cross-validation on month boundaries.

Headline numbers

We pair every 15-min model prediction with the closest actual SLDC reading within ±5 minutes, drop predictions without a paired actual, then aggregate. Last 30 days, all hours.

Per-state MAPE for the LightGBM demand nowcast over the last 30 days
Per-state MAPE across the deployed LightGBM nowcast — last 30 days. Green ≤5%, amber 5–10%, red >10% or insufficient samples. AP, Gujarat, Delhi, and Punjab cluster at production-grade accuracy.
Per-state nowcast accuracy · last 30 days · sorted best-first
StateSamplesMAPERMSEBiasTypical demand
Andhra Pradesh5682.51%374 MW-94 MW0.90710.8 GW
Gujarat9423.77%1001 MW-444 MW0.83619.6 GW
Delhi5095.79%371 MW-278 MW0.8665.2 GW
Punjab8286.89%581 MW+7 MW0.7826.5 GW

Four states cluster in the production-quality range — single-digit MAPE, R² above 0.78, and bias that's small relative to absolute load. Andhra Pradesh at 2.5% MAPE on a 10.8 GW typical load with R² = 0.91 is the best-performing state. Punjab is the most calibrated, with a mean bias of just +7 MW on a ~6.5 GW load — the model is essentially unbiased on average.

Andhra Pradesh deep dive — the 2.5% MAPE state

A single day, three layers — actual SLDC reading, the LightGBM nowcast for the same instant, and the LightGBM's 7-day-ahead forecast issued before the day started:

Andhra Pradesh electricity demand on 2026-04-25 UTC: actual SLDC (solid), LightGBM nowcast (dashed), and LightGBM forecast (dotted)
AP demand on 2026-04-25 UTC. The dashed orange line is the LightGBM nowcast (modeled_ml_v1); the dotted green line is the same booster running 7 days ahead recursively (modeled_ml_forecast_v1). The morning peak is where the model is mildly conservative — it doesn't see today's heat early enough.

Daily MAPE for AP, last 21 days — this is the stability story we care about. A model that's 2% one day and 12% the next isn't useful no matter what its mean is.

Daily MAPE for the Andhra Pradesh LightGBM nowcast over the last 21 days
AP daily MAPE clusters tightly between 1.5% and 4%. The 5% target line is the bar we want to consistently stay under. Single-day spikes are typically holiday days the calendar feature didn't catch (festivals with state-specific dates).

Predicted vs actual scatter, coloured by hour-of-day — shows where the model tracks the diagonal and where it diverges:

Predicted vs actual demand for Andhra Pradesh, scatter plot coloured by hour of day
R² = 0.91 means the booster captures 91% of the variance in AP demand. The horizontal banding around 12,500 MW is the morning peak — the model has a soft ceiling there, which is why daytime MAPE is slightly worse than night.

Where it misses

MAPE broken out by hour of day for AP — this answers “when should I trust the model and when should I cross-check?”:

MAPE by hour of day for the Andhra Pradesh LightGBM nowcast
MAPE is lowest from 21:00 to 11:00 IST (~1.5–2%) and rises during the afternoon air-conditioning peak (15:00 IST hits ~4.7%). The model can't fully see today's heat ramp because weather is hourly and lag features are 24h/7d old; this gap is exactly where adding nowcast weather (every 15 min instead of every 60) would help most.

Residual distribution — predicted minus actual:

Residual distribution histogram (predicted minus actual demand) for Andhra Pradesh
Residuals are roughly Gaussian with a mild negative bias (mean −94 MW on a ~10.8 GW load, ~−0.9%). The negative skew is structural: lag features can't see “today's surprise” — heatwaves, regional outages, or unusual industrial offtake — and so the model under-predicts on high-stress days.

States not yet production-ready

Two of the six states where the booster is currently emitting rows are flagged as not production-ready. We're keeping them visible — the data is there if you want to inspect it — but we don't advertise them as a shipped product. The other five OFFICIAL_FEED_STATES (Karnataka, Rajasthan, Tamil Nadu, Telangana, West Bengal) have boosters trained but not yet deployed to production inference.

States flagged as not yet production-ready
StateSamplesMAPEWhy we hold it back
Himachal Pradesh965220.8%Calibration error — small absolute load amplifies any over-prediction. Booster needs per-state target normalisation.
Kerala1124.2%Insufficient paired observations — upstream feed publishes daily-summary, not 15-min

Himachal Pradesh in particular is interesting: the booster trained, the inference pipeline runs, but the absolute load is so small (~310 MW peak) that any over-prediction looks catastrophic in percentage terms. The fix is per-state target normalisation, queued as part of the next training run.

Reproduce these numbers

The full evaluation script is scripts/eval_demand_models.py in the atlas backend repo. With read-only DB credentials and Python ≥ 3.11:

pip install psycopg[binary] sqlalchemy matplotlib pandas numpy
export DATABASE_URL='postgresql://USER:PASS@HOST:25060/grid?sslmode=require'
python scripts/eval_demand_models.py --out eval-output/ --days 30

That writes metrics.json, metrics.csv, and the six PNG plots embedded above. The CSV is downloadable here for the snapshot shown on this page.

Limitations & open work

  • Historical depth. Most clean SLDC data starts September 2025. The booster has seen one summer and one winter. Expect material seasonal-bias correction once we have a full annual cycle plus a year of holdout for validation.
  • No real-time signal in the model. demand_lag_24h and demand_lag_7d are observations from the past. Anything unusual today (heatwave, regional outage, tariff change) only enters via weather and the calendar — not directly. The model under-predicts on high-stress days and we have an open ticket to add a 1-hour lag with proper feature freshness handling.
  • Hourly weather resolution. Open-Meteo gives 1-hour temperature; AP morning ramp can swing 500+ MW within a single hour. We're effectively low-pass filtering temperature. A 15-min weather feed would close most of the daytime MAPE gap.
  • No cross-state spillover signal. Each state's model knows All-India demand and its own lags, but not what's happening live in neighbouring states. Pooled multi-state models on the next iteration.
  • 5 of 11 trained states not yet emitting predictions. Karnataka, Rajasthan, Tamil Nadu, Telangana, and West Bengal have trained boosters but the inference cron is not yet writing rows for them. Tracked internally; expect them in production by the end of the next sprint.

Computed from 3,823 matched 15-minute samples over the last 30 days. Snapshot generated 3 May 2026. Code: scripts/eval_demand_models.py.

Filed under
Engineering · Published 3 May 2026
← More from The Atlas Journal