A Hierarchical Reinforcement Learning Framework for Risk-Aware Portfolio Management with Drawdown Minimization
Neelesh Nayak
CUCAI 2026 Proceedings - 2026
Abstract
Modern portfolio management requires dynamic strategies that adapt to shifting macroeconomic regimes while prioritizing capital preservation. This paper introduces MacroHRL, a two-level hierarchical reinforcement learning (HRL) framework engineered for robust risk mitigation and drawdown minimization. The high-level Meta-Controller, a Proximal Policy Optimization (PPO) agent, selects a market regime (Bull, Bear, Crisis, or Sideways) each quarter based on macroeconomic indicators (VIX, CPI, and Yield Curve). The low-level consists of four specialized PPO Sub-Controllers, each trained on regime-specific historical data to learn optimal daily allocation policies. By explicitly penalizing tail risk through a Conditional Value-at-Risk (CVaR) reward function, MacroHRL achieves superior risk-adjusted performance. Tested on out-of-sample data (2023–2025), MacroHRL demonstrates a significant reduction in maximum drawdown compared to a buy-and-hold SPY strategy, achieving an annualized return of 28.07% with a maximum drawdown of only -9.90%, establishing it as a highly effective framework for risk-sensitive institutional investment.