Podcast: Halperin on reinforcement learning and option pricing
Fidelity quants working on machine learning techniques to optimise investment strategies
To most people, Chess and Go are complex strategy games. Mathematically speaking, they are optimisation problems that involve a sequence of multi-period decision making, which are hard to solve because of their non-linearity and high dimensionality. But in recent years, artificial intelligence researchers have shown that machines can be trained to master such games using reinforcement learning (RL), setting the stage for wider applications of the technique to solve complex mathematical problems.
Igor Halperin, senior quant analyst at the AI centre of excellence for asset management at Fidelity Investments, has long been convinced that RL could be applied to portfolio management.
He received Risk.net’s Buy-side Quant of the year award in 2021 for his research with Matthew Dixon, professor at the Illinois Institute of Technology, on optimising retirement plans and target date funds using RL and inverse RL (IRL).
In this edition of Quantcast, Halperin discusses his most recent work with Fidelity colleagues Jiayu Liu and Xiao Zhang on applying a similar approach to the problem of the asset allocation among equity sectors.
“This is something I have been envisioning since 2018 […] and part of a general plan I had,” says Halperin, adding that the progress to date has been encouraging.
Reinforcement learning links decision-making to a reward function, which must be maximised to obtain the optimal outcome. Commonly, the reward function is pre-determined by the user – it may be, for example, a risk-adjusted measure of return that the algorithm aims to exceed by testing all possible sequence combinations.
Inverse reinforcement learning does the opposite, taking the strategies of human experts and working backwards to identify the reward function that explains their decisions. Halperin and his co-authors use IRL to essentially crowdsource a robust reward function from the strategies of multiple portfolio managers. “Once you have a reward function you know what you should do,” he explains. The RL algorithm is then used develop an asset allocation strategy that maximises this general reward function.
According to Halperin, this approach can potentially improve the performance of a homogeneous group of fund managers by giving them investment recommendations that can help remove biases and quirks from their investment process.
In this podcast, Halperin also discusses his long-standing criticism of standard option pricing models, which he maintains are fundamentally flawed. “Should I say they are all wrong, or should I say they’re not even wrong?” he muses, channelling the words of theoretical physicist Wolfgang Pauli.
His point is that standard models based on geometric Brownian motion can capture volatility but fail to account for the existence of a drift in asset prices. In 2021, he proposed an alternative approach that resembles the geometric Brownian motion while adjusting for the drift term. In his setting, the drift is a non-linear function that accounts for market inflows and outflow, as well as frictions.
Discussing the influence of physics on quant finance, Halperin notes the differences between linear models, represented by classic parametric models, and non-linear models, which mostly include neural networks. The former offer clear interpretability of the phenomenon they describe but cannot describe complex systems, while the latter can handle complex systems but cannot be controlled. Halperin sees tensor networks, a functional toolset borrowed from physics, as a good middle ground, “because they do non-linearities but in a controlled way”.
For those curious to learn more, the most recent episode of Quantcast with Vladimir Piterbarg and Alexandre Antonov was entirely dedicated to tensor train approaches.
Halperin is now working on research projects that blend concepts from different branches of finance and statistics. One, for example, deals with multi-agent reinforcement learning, where a reinforcement learning algorithm drives the behaviour of agents in a model, allowing them to adapt based on their interactions with each other.
Index
00:00 RL and IRL in fund management
06:27 Application of RL and IRL to portfolio management
10:27 Previous application of RL to wealth management
13:05 Why RL is not a black box
16:20 Further applications of RL in finance
20:25 Option pricing models – not even wrong?
29:45 Physics and finance
36:30 Future research projects
To hear the full interview, listen in the player above, or download. Future podcasts in our Quantcast series will be uploaded to Risk.net. You can also visit the main page here to access all tracks, or go to the iTunes store, Spotify or Google Podcasts to listen and subscribe.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net
More on Cutting Edge
Choosing trading strategies using importance sampling
The sampling technique is more efficient than A-B testing at comparing decision rules
A comparison of FX fixing methodologies
FX fixing outcomes are mostly driven by length of calculation window
Quantum cognition machine learning: financial forecasting
A new paradigm for training machine learning algorithms based on quantum cognition is presented
Backtesting correlated quantities
A technique to decorrelate samples and reach higher discriminatory power is presented
A hard exit threshold strategy for market-makers
A closed-form solution to derive optimal stop-loss and profit-taking levels is presented
Pricing share buy-backs: an alternative to optimal control
A new method applies optimised heuristic strategies to maximise share buy-back contracts’ value
CVA sensitivities, hedging and risk
A probabilistic machine learning approach to CVA calculations is proposed
Podcast: Alvaro Cartea on collusion within trading algos
Oxford-Man Institute director worries ML-based trading could have anti-competitive effects