Synthetic data enters its Cubist phase
Quants are using the theory of rough paths to distil the essence of financial datasets
Pablo Picasso was untroubled by the mixed reaction to his portrait of the art patron Gertrude Stein. “Everybody says that she does not look like it,” Picasso said, “but that does not make any difference – she will.”
Years later, his subject agreed: “For me it is I, and it is the only reproduction of me which is always I, for me.” Thoughtful representation, Stein came to realise, is sometimes better than exact replication.
Quants working on synthetic datasets seem to be having their own Picasso moment. These representations of financial history can be used to overcome the paucity of real-world data needed to train deep-learning algorithms in finance. But so-called generator models – which produce synthetic datasets that closely resemble the statistical properties of real market data – seem to miss something, especially when sampling under certain conditions.
The theory of rough paths, developed by University of Oxford professor Terry Lyons, may offer a solution to this problem. Rough paths describe the interaction between non-linear systems. A new paper based on the theory proposes using ‘signatures’ – mathematical objects that are able to encode financial data in a parsimonious and efficient way – to create synthetic datasets that capture the essence of financial markets. The paper is the fruit of a collaboration between Lyons, Hans Buehler and Ben Wood of JP Morgan, Blanka Horvath of King’s College London and Imanol Perez, who contributed while at Oxford University.
Signatures capture multiple characteristics of a time-series distribution without losing the implicit narrative thread in the data. In practice, they look like a series of coefficients that enclose information about a stream of data, such as market prices.
Lyons draws an analogy with films to illustrate how this works. Imagine each frame in a film is a sample value. Periodically sampling a single frame every few minutes would make little sense. An approach using signatures would instead sample a few minutes at a time and attempt to summarise what happens within each interval.
A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems
Terry Lyons, University of Oxford
The idea is that describing a stream of data in terms of a succession of effects or trends provides more meaningful information than random sampling at fixed intervals.
Rough paths don’t make stochastic assumptions about the systems they describe. Rather than sampling prices at precise times, whether hourly or daily, they capture the effects of the data on non-linear systems – for instance, the profit or loss that would arise from applying a hedging strategy – over intervals of time.
“Instead of saying where everything is at a given time, they look at the effects of the stream of data on simple systems,” says Lyons. “A signature is a sort of universal version of what a stream does when it interacts with nonlinear systems. It’s a different way to describe the data.”
The so-called first order of the signature describes the drift up or down in prices from start to finish. The second order measures the volatility of the path over certain time steps. “It’s clear that if prices reach from one point to another with smooth movements, that’s a different story from if you have a volatile journey,” Horvath says. “Higher orders stretch beyond what can be intuitively described.”
The authors show that signature-based models can be trained faster than traditional data generators. They also show that models using signatures retain information more efficiently than data generators that learn directly from raw data and can lose information in the sampling phase.
Crucially, according to Lyons, low order measures in signatures can derive useful information even from small sets of features. This can make all the difference when there is limited data to train a neural network. “You can generally do better with deep learning combined with signatures,” he says. “But if you have small data, then sometimes signatures work really very well, are quick to train and economical.”
The theory of rough paths is being applied in a myriad of fields. “The recognition of hand-written Chinese characters was the first large-scale application of signatures,” says Lyons. Signatures have also been used to recognise actions performed by matchstick people in videos – for example, kicking a ball or swinging a golf club.
In 2019, a team led by James Morrill, a student of Lyons, used signatures-based algorithms to detect early signs of sepsis in medical data.
The underlying theme of actionable pattern recognition also has tremendous uses in finance. Data generated using signatures could be used to train the deep hedging algorithms pioneered by Buehler and Wood at JP Morgan, among others. Other financial applications include simulating market data to price derivatives and test new trading strategies.
Lyons and his co-authors plan to continue their work on perfecting market generators. Future developments in this field may well bear their signatures.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact info@risk.net to find out more.
You are currently unable to copy this content. Please contact info@risk.net to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@risk.net
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@risk.net
More on Our take
Podcast: Alexei Kondratyev on quantum computing
Imperial College London professor updates expectations for future tech
Quants mine gold for new market-making model
Novel approach to modelling cointegrated assets could be applied to FX and potentially even corporate bond pricing
Thin-skinned: are CCPs skimping on capital cover?
Growth of default funds calls into question clearers’ skin in the game
Quants dive into FX fixing windows debate
Longer fixing windows may benefit clients, but predicting how dealers will respond is tough
Talking Heads 2024: All eyes on US equities
How the tech-driven S&P 500 surge has impacted thinking at five market participants
Beware the macro elephant that could stomp on stocks
Macro risks have the potential to shake equities more than investors might be anticipating
Podcast: Piterbarg and Nowaczyk on running better backtests
Quants discuss new way to extract independent samples from correlated datasets
Should trend followers lower their horizons?
August’s volatility blip benefited hedge funds that use short-term trend signals