Benchmarking Deep RL for Off-Grid Hybrid Microgrids in Sub-Saharan Africa
Jordan Leis
CUCAI 2026 Proceedings - 2026
Abstract
Reliable electricity access for approximately 570 million unelectrified people in sub-Saharan Africa (SSA) de pends on off-grid solar-battery-diesel hybrid microgrids. The energy management system (EMS) governing battery dispatch and diesel throttle critically impacts reliability and fuel cost, yet reinforcement learning (RL) research in this domain is fragmented—most studies evaluate one or two algorithms at a single site. We present a 150-run benchmark of six deep RL algorithms—SAC, DDPG, TQC, PPO, A2C, and Recurrent PPO—across five climatically distinct SSA locations and five independent seeds. Agents are trained on five years of real NASA POWER irradiance data (2019–2023) and evaluated on a held-out 2024 year in a high-fidelity Gymnasium simulation. Off-policy algorithms achieve near-zero unmet energy; DDPG attains the best aggregate performance (7.5 kWh/yr unserved, 20,007 L/yr diesel—a 23% fuel reduction over SAC/TQC). On-policy methods exhibit systematic failure modes.