Harnessing Market Memory: Adaptive Reinforcement Learning with Fractional Brownian Motion for Portfolio Optimization
Shivam Sharma1, Shahram Latifi2, Pushkin Kachroo3
Department of Electrical & Computer Engineering, University of Nevada, Las Vegas, USA
1shivam.sharma@unlv.edu, 2shahram.latifi@unlv.edu, 3pushkin.kachroo@unlv.edu
Abstract
This research introduces a novel reinforcement learning framework for portfolio optimization that leverages the complex statistical properties of financial markets through fractional Brownian motion (fBM). Unlike traditional methods that rely on memoryless or mean-reverting processes, our approach captures the long-range dependencies, persistence, and anti-persistence observed in empirical asset returns. Central to this framework is a meta-controller that dynamically calibrates the underlying Hurst parameter, enabling the trading agent to switch adaptively among specialized strategies trained for different market regimes. By integrating non-Markovian dynamics into the simulation environment and employing a hierarchical control structure, our method allows the agent to learn more robust and context-aware policies. Empirical evaluations demonstrate that agents operating under this adaptive, fBM-driven paradigm achieve near-optimal performance in fluctuating market conditions.
Introduction
Deep reinforcement learning (dRL) techniques have gained attention in portfolio optimization due to their ability to solve complex, nonlinear control problems under uncertainty. We extend the deep deterministic portfolio optimization approach by replacing the Ornstein-Uhlenbeck (OU) process with fractional Brownian motion (fBM).

While the OU process is inherently mean-reverting, fBM can capture the long memory and self-similarity often characteristic of financial market data.

The Hurst parameter H in fBM allows it to exhibit either:

  • Persistence (H > 0.5): Positive trends likely to continue
  • Anti-persistence (H < 0.5): Trends likely to revert
  • Standard Brownian (H = 0.5): No correlation between increments
Standard Brownian Motion (H = 0.5) Price Time
Standard Brownian Motion showing random fluctuations with no memory or trend persistence
Fractional Brownian Motion
Fractional Brownian motion BH(t) is defined as a continuous-time Gaussian process with:
  • Zero mean: E[BH(t)] = 0
  • Variance: Var[BH(t)] = t2H
  • Covariance function: Cov(BH(t), BH(s)) = (1/2)(t2H + s2H - |t-s|2H)
  • Self-similarity: For scaling factor c > 0, the process's behavior looks statistically the same as if you had instead scaled its amplitude by cᴴ

These properties make fBM particularly suitable for modeling financial time series that exhibit momentum effects or mean-reversion behaviors, both common in financial markets.

Why Use Fractional Brownian Motion?

  • Long Memory: Financial markets often display long memory where autocorrelations decay more slowly than exponentially. fBM, with its Hurst parameter, provides a flexible way to model both short and long memory.
  • Better Fit to Empirical Data: Market data analyses often reveal Hurst exponents significantly different from 0.5, indicating either persistence or anti-persistence.
  • Non-Markovian Dynamics: Financial time series are generally non-Markovian, meaning their future evolution depends on the entire history, not just the current state.
Reinforcement Learning Framework

Reinforcement learning provides a framework for learning optimal behaviors through trial-and-error interactions with an environment. In our portfolio optimization context:

  • Agent: The portfolio manager
  • Environment: The financial market with fBM dynamics
  • State: Market conditions and current portfolio
  • Action: Portfolio allocation decisions
  • Reward: Financial return adjusted for risk and costs
RL Agent (Portfolio Manager) Environment (Financial Market with fBM) Market with Hurst Parameter H Actions (Portfolio Allocations) State, Reward (Market Data, Returns) Time H=0.1 H=0.5 H=0.9 H(t)
Reinforcement Learning framework for portfolio optimization with dynamic Hurst parameter. The agent (portfolio manager) interacts with the environment (market) by taking actions (allocations) and receiving states and rewards (market data and returns). The market's Hurst parameter H changes over time, requiring adaptive strategies.
Comparison of fBM and Ornstein-Uhlenbeck Process
fBM with H = 0.9 (Persistent)
fBM with H = 0.5 (Standard)
fBM with H = 0.1 (Anti-persistent)
Ornstein-Uhlenbeck Process
Comparison of sample paths: (Far Left) fBM with H = 0.9 shows persistence with sustained trends; (Middle Left) fBM with H = 0.5 represents standard Brownian motion with random walk properties; (Middle Right) fBM with H = 0.1 demonstrates anti-persistence with frequent reversals; (Far Right) Ornstein-Uhlenbeck process exhibits mean-reversion to a central value.
Property Fractional Brownian Motion Ornstein-Uhlenbeck Process
Memory Long memory (H > 0.5) or anti-persistent (H < 0.5) Memoryless (Markovian)
Mean Reversion Flexible: can show persistence or anti-persistence Always mean-reverting
Fit to Empirical Data Better matches empirical Hurst exponents in market data Limited to specific mean-reverting securities
Mathematical Structure Non-Markovian, non-semimartingale Markovian, semimartingale
Method: Adaptive Reinforcement Learning Framework

We frame dynamic portfolio optimization as a reinforcement learning problem:

State: Current portfolio allocation and market data
Action: Change in portfolio weights between time periods
Reward: Portfolio returns minus transaction costs and risk penalty
Return process: Fractional Brownian motion plus random noise

For this continuous, non-Markovian environment, we use the Deep Deterministic Policy Gradient (DDPG) algorithm to learn effective portfolio management policies.

Meta-Controller Framework: To handle dynamically changing market regimes, we implement a hierarchical control structure:

Market Signal Hurst Estimator R/S Analysis Meta-Controller Controller 1 (Anti-persistent) Controller 2 (Standard Brownian) Controller 3 (Persistent) Portfolio Action Estimate H H = 0.1 H = 0.5 H = 0.9 Active Controller Output

Key components of the adaptive meta-controller framework:

  • We train specialized RL agents for different Hurst parameters (H = 0.1, 0.5, 0.9)
  • The meta-controller periodically performs R/S analysis to estimate current Hurst parameter
  • Based on the estimated Hurst value, it switches between specialized controllers
  • This allows adaptive response to changing market conditions as they evolve
Results

Performance evaluation shows our fBM-driven RL agents significantly outperform random baseline strategies across all Hurst regimes:

Random
Performance comparison of trained models vs. random agent. The blue bars represent RL agents, while the red bar shows the random agent's performance.

Meta-controller performance in dynamic Hurst-switching environment:

  • Maintains consistent performance across 100 independent runs
  • Adapts effectively to regime shifts between persistent, anti-persistent, and memoryless market conditions
  • Demonstrates robustness to non-Markovian dynamics and changing market characteristics
Conclusion
  • Incorporating fractional Brownian motion into RL-based portfolio optimization allows for better modeling of real-world market memory effects
  • The adaptive meta-controller framework successfully navigates between different market regimes by dynamically calibrating the Hurst parameter
  • Empirical evaluations demonstrate that our approach outperforms random strategies and maintains robust performance in fluctuating market conditions
  • This research bridges the gap between theoretical RL models and real-world financial applications by accounting for long-range dependencies and complex market dynamics
  • Future work will explore additional market complexities such as time-varying volatility, multi-asset interactions, and risk-adjusted utility functions
Key References
[1] D. Silver, J. Schrittwieser, K. Simonyan, et al., "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354–359, 2017.
[2] A. Chaouki, S. Hardiman, C. Schmidt, E. Serié, and J. de Lataillade, "Deep deterministic portfolio optimization," arXiv preprint arXiv:2003.06497, 2020.
[3] R. Cont, "Empirical properties of asset returns: stylized facts and statistical issues," Quantitative Finance, vol. 1, no. 2, pp. 223–236, 2001.
[4] L. C. G. Rogers, "Arbitrage with fractional brownian motion," Mathematical Finance, vol. 7, no. 1, pp. 95–105, 1997.
[5] M. Garcin, "Forecasting with fractional brownian motion: a financial perspective," Quantitative Finance, vol. 22, no. 8, pp. 1495–1512, 2022.
[6] J.-P. Fouque and R. Hu, "Portfolio optimization under fast mean-reverting and rough fractional stochastic environment," Applied Mathematical Finance, vol. 25, no. 4, pp. 361–388, 2018.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2019.