Abstract

This research introduces a novel reinforcement learning framework for portfolio optimization that leverages the complex statistical properties of financial markets through fractional Brownian motion (fBM). Unlike traditional methods that rely on memoryless or mean-reverting processes, our approach captures the long-range dependencies, persistence, and anti-persistence observed in empirical asset returns. Central to this framework is a meta-controller that dynamically calibrates the underlying Hurst parameter, enabling the trading agent to switch adaptively among specialized strategies trained for different market regimes. By integrating non-Markovian dynamics into the simulation environment and employing a hierarchical control structure, our method allows the agent to learn more robust and context-aware policies. Empirical evaluations demonstrate that agents operating under this adaptive, fBM-driven paradigm achieve near-optimal performance in fluctuating market conditions.

Introduction

Deep reinforcement learning (dRL) techniques have gained attention in portfolio optimization due to their ability to solve complex, nonlinear control problems under uncertainty. We extend the deep deterministic portfolio optimization approach by replacing the Ornstein-Uhlenbeck (OU) process with fractional Brownian motion (fBM).

While the OU process is inherently mean-reverting, fBM can capture the long memory and self-similarity often characteristic of financial market data.

The Hurst parameter H in fBM allows it to exhibit either:

Persistence (H > 0.5): Positive trends likely to continue
Anti-persistence (H < 0.5): Trends likely to revert
Standard Brownian (H = 0.5): No correlation between increments

Standard Brownian Motion showing random fluctuations with no memory or trend persistence

Fractional Brownian Motion

Fractional Brownian motion B_H(t) is defined as a continuous-time Gaussian process with:

Zero mean: E[B_H(t)] = 0
Variance: Var[B_H(t)] = t^2H
Covariance function: Cov(B_H(t), B_H(s)) = (1/2)(t^2H + s^2H - |t-s|^2H)
Self-similarity: For scaling factor c > 0, the process's behavior looks statistically the same as if you had instead scaled its amplitude by cᴴ

These properties make fBM particularly suitable for modeling financial time series that exhibit momentum effects or mean-reversion behaviors, both common in financial markets.

Why Use Fractional Brownian Motion?

Long Memory: Financial markets often display long memory where autocorrelations decay more slowly than exponentially. fBM, with its Hurst parameter, provides a flexible way to model both short and long memory.
Better Fit to Empirical Data: Market data analyses often reveal Hurst exponents significantly different from 0.5, indicating either persistence or anti-persistence.
Non-Markovian Dynamics: Financial time series are generally non-Markovian, meaning their future evolution depends on the entire history, not just the current state.

Reinforcement Learning Framework

Reinforcement learning provides a framework for learning optimal behaviors through trial-and-error interactions with an environment. In our portfolio optimization context:

Agent: The portfolio manager
Environment: The financial market with fBM dynamics
State: Market conditions and current portfolio
Action: Portfolio allocation decisions
Reward: Financial return adjusted for risk and costs

Reinforcement Learning framework for portfolio optimization with dynamic Hurst parameter. The agent (portfolio manager) interacts with the environment (market) by taking actions (allocations) and receiving states and rewards (market data and returns). The market's Hurst parameter H changes over time, requiring adaptive strategies.

Comparison of fBM and Ornstein-Uhlenbeck Process

fBM with H = 0.9 (Persistent)

fBM with H = 0.5 (Standard)

fBM with H = 0.1 (Anti-persistent)

Ornstein-Uhlenbeck Process

Comparison of sample paths: (Far Left) fBM with H = 0.9 shows persistence with sustained trends; (Middle Left) fBM with H = 0.5 represents standard Brownian motion with random walk properties; (Middle Right) fBM with H = 0.1 demonstrates anti-persistence with frequent reversals; (Far Right) Ornstein-Uhlenbeck process exhibits mean-reversion to a central value.

Property	Fractional Brownian Motion	Ornstein-Uhlenbeck Process
Memory	Long memory (H > 0.5) or anti-persistent (H < 0.5)	Memoryless (Markovian)
Mean Reversion	Flexible: can show persistence or anti-persistence	Always mean-reverting
Fit to Empirical Data	Better matches empirical Hurst exponents in market data	Limited to specific mean-reverting securities
Mathematical Structure	Non-Markovian, non-semimartingale	Markovian, semimartingale

Method: Adaptive Reinforcement Learning Framework

We frame dynamic portfolio optimization as a reinforcement learning problem:

State:	Current portfolio allocation and market data
Action:	Change in portfolio weights between time periods
Reward:	Portfolio returns minus transaction costs and risk penalty
Return process:	Fractional Brownian motion plus random noise

For this continuous, non-Markovian environment, we use the Deep Deterministic Policy Gradient (DDPG) algorithm to learn effective portfolio management policies.

Meta-Controller Framework: To handle dynamically changing market regimes, we implement a hierarchical control structure:

Key components of the adaptive meta-controller framework:

We train specialized RL agents for different Hurst parameters (H = 0.1, 0.5, 0.9)
The meta-controller periodically performs R/S analysis to estimate current Hurst parameter
Based on the estimated Hurst value, it switches between specialized controllers
This allows adaptive response to changing market conditions as they evolve

Results

Performance evaluation shows our fBM-driven RL agents significantly outperform random baseline strategies across all Hurst regimes:

Performance comparison of trained models vs. random agent. The blue bars represent RL agents, while the red bar shows the random agent's performance.

Meta-controller performance in dynamic Hurst-switching environment:

Maintains consistent performance across 100 independent runs
Adapts effectively to regime shifts between persistent, anti-persistent, and memoryless market conditions
Demonstrates robustness to non-Markovian dynamics and changing market characteristics

Conclusion

Incorporating fractional Brownian motion into RL-based portfolio optimization allows for better modeling of real-world market memory effects
The adaptive meta-controller framework successfully navigates between different market regimes by dynamically calibrating the Hurst parameter
Empirical evaluations demonstrate that our approach outperforms random strategies and maintains robust performance in fluctuating market conditions
This research bridges the gap between theoretical RL models and real-world financial applications by accounting for long-range dependencies and complex market dynamics
Future work will explore additional market complexities such as time-varying volatility, multi-asset interactions, and risk-adjusted utility functions

Key References

[1] D. Silver, J. Schrittwieser, K. Simonyan, et al., "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354–359, 2017.

[2] A. Chaouki, S. Hardiman, C. Schmidt, E. Serié, and J. de Lataillade, "Deep deterministic portfolio optimization," arXiv preprint arXiv:2003.06497, 2020.

[3] R. Cont, "Empirical properties of asset returns: stylized facts and statistical issues," Quantitative Finance, vol. 1, no. 2, pp. 223–236, 2001.

[4] L. C. G. Rogers, "Arbitrage with fractional brownian motion," Mathematical Finance, vol. 7, no. 1, pp. 95–105, 1997.

[5] M. Garcin, "Forecasting with fractional brownian motion: a financial perspective," Quantitative Finance, vol. 22, no. 8, pp. 1495–1512, 2022.

[6] J.-P. Fouque and R. Hu, "Portfolio optimization under fast mean-reverting and rough fractional stochastic environment," Applied Mathematical Finance, vol. 25, no. 4, pp. 361–388, 2018.

[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2019.