Sponsored by ?

This article was paid for by a contributing third party.More Information.

Ultra-low latency trading: how low can you go?

Ultra-low latency trading: how low can you go?
Allison Saeng/Unsplash

In the world of high-frequency trading, nanoseconds gained in trade execution can mean the difference between success and failure. Hamid Salehi of AMD discusses the technological advances that are taking high-frequency trading to new lows – known as ultra-low latency trading (ULLT)

Hamid Salehi, AMD 2024
Hamid Salehi, AMD

ULLT is a trading strategy that emphasises the importance of executing trades at lightning speeds, where nanoseconds – not microseconds – can determine the success of a trade.

All else being equal, the speed and efficiency of supporting technology, and the impact of data quality and jitter (variation in the level of delay) on trading performance will determine a firm’s ability to exploit market opportunities in this high-stakes competitive landscape.

When nanoseconds can make a difference in winning trades, employing the latest trading devices and technology becomes paramount.

AMD recently launched the latest version of its purpose-built, field-programmable gate array (FPGA)-based trading card for ultra-low latency electronic trading. The new AMD Alveo™ UL3422 FinTech Accelerator card offers record-breaking trade execution performance,1 at nanosecond speed,2 in a slim form for flexible server deployment. AMD’s goal is to continue innovating and serving the trading markets with new products in the coming years.

The Alveo UL3422 FinTech Accelerator 

Ultra-low latency trading: how low can you go?

Earlier this summer, AMD announced a world-record STAC-T0 benchmark result with its first ultra-low latency card – the Alveo UL3524 Accelerator. The new Alveo UL3422 card is powered by the same FPGA device as its predecessor, but with the port density, on-board memory and connectivity options streamlined – matching the performance in half the size. The new card provides trade execution performance at nanosecond speed in a slim form for flexible server deployment options. 

What is ULLT? How should we define it in today’s market?

Hamid Salehi: In simple terms, ULLT involves seeing some signal in the market and executing a trade as fast as possible, at which point it becomes a race. The trader with the lowest-latency solution is most likely to win the trade.

But data signals can be noisy and not entirely deterministic. As the influx of data continues, electronic exchanges – including electronic communications networks, alternative trading systems and segments of traditional stock exchanges – strive to enhance the quality of the data feeds they offer participants. If the data jitter decreases from a microsecond (one-millionth of a second) to a nanosecond (one thousand-millionth of a second), this change becomes noticeable.

Even with the introduction of equipment capable of executing trades within a fraction of a microsecond, the impact diminishes because of the prevailing noise. However, if the same exchange manages to reduce its latency or jitter to a nanosecond, significant shifts occur.

Traders with microsecond capabilities become more vulnerable, while those equipped for nanosecond execution gain a substantial advantage in securing trade deals. This scenario illustrates a sort of probability distribution, with jitter representing the standard deviation. As the jitter decreases and speed increases, the likelihood of winning trades improves significantly.

Who is using ULLT, where and why?

Hamid Salehi: Over the past decade, exchanges have consistently improved their performance and minimised jitter. This progress triggered a competitive rush for products enabling lower-latency trades, leading to an ongoing arms race among traders. Having devices capable of executing trades with minimal latency alters the distribution of outcomes. Consequently, parameters such as a lower standard deviation gain importance.

Firms that typically rent space at the exchange and aim to execute trades swiftly will benefit from these devices. The mathematical aspect is relatively straightforward, despite the extensive preparatory work involving Greeks, computing systems, historical data analysis and machine learning. The trade execution itself is simple, especially for those situated at the exchange seeking rapid execution. Many proprietary trading shops use AMD FPGA hardware, designed to deliver trade execution in the tens of nanoseconds.

These devices enable traders to exploit arbitrage opportunities swiftly, ultimately enhancing market efficiency and benefiting all participants. For example, ensuring parity between exchange-traded fund prices and their underlying assets becomes more seamless, instilling confidence in investors and contributing to overall market efficiency.

For exchanges with deterministic behaviours, low-latency devices can significantly impact trade outcomes. Market-makers benefit greatly, as their role necessitates swift participation to manage risk effectively. Thus, they tend to be early adopters of such products.

Beyond market-makers, a whole host of other traders can benefit from extreme low-latency electronic trading.

What have been the key milestones in the evolution of ULLT?

Hamid Salehi: For many years, the ultra-low latency electronic trading community has relied on cutting-edge hardware to execute trades swiftly. AMD’s traditionally more general purpose FPGA products have been widely used across various segments. However, in recent years, AMD made a strategic decision to develop purpose-built silicon tailored specifically for the electronic trading market.

The key innovation in the silicon lies in the transceiver architecture, specifically designed for 10 gigabit and 10/25 gigabit ethernet, aligning with the operational standards of most exchanges. The focus on optimising latency in terms of physical medium attachment (PMA) and physical coding sub-layer (PCS) media access code (MAC) has yielded remarkable results, surpassing our previous FPGA performance by up to sevenfold.3 This represents a significant advancement in the field.

In terms of available hardware, the PMA response time for transmit plus receive now stands at just under 2.5 nanoseconds,2 placing AMD at the forefront of latency optimisation. Combined with the flexibility of FPGA fabric, which allows for diverse design implementations, AMD silicon stands out as a unique offering.

Major players in the electronic trading industry, whether based in Chicago, New York or outside the US, have already adopted AMD products for production use. 

What are the key benefits of the Alveo UL3422 Acclerator Card?

Hamid Salehi: Compared with previous generations, the latest iteration of the Alveo UL3422 Accelerator boasts several notable advantages. The PMA transceiver architecture serves as the analogue core, alongside the match component, and most AMD FPGAs are built on a 16-nanometre process node and AMD UltraScale+™ architecture.

In the ULLT domain, AMD advancements in transceiver architecture stand out, which now operates up to seven times faster than in the previous generation.3 We’ve enhanced the PMA, and introduced a hardened ethernet MAC component and PCS for data transmission and control management, connecting directly to the new transceiver architecture to help reduce latency. Consequently, users in electronic trading no longer need to develop their own MAC and connect it to the new transceivers. This feature has garnered significant praise from users, marking a substantial improvement.

Additionally, this silicon piece introduces a new transceiver architecture, densely packing the chip with 72 transceivers. Each of these transceivers can operate at up to 10 or 25 gigabits per second, facilitating high-speed data transmission. This large number of transceivers extends the applicability of the card beyond trading to include layer-one switching, which is the default mechanism for onward distribution of an exchange data feed to other data feed handlers.

Moreover, the AMD design optimises performance by consolidating the entire logic onto a single monolithic die, removing delays associated with inter-partition communication.

When discussing ULLT on FPGAs, it is crucial to recognise the significant disparity from central processing unit (CPU)-based approaches. For instance, data entering and exiting the fabric of an AMD FPGA incurs a mere three nanoseconds,2 with an additional 10–20 nanoseconds spent within the fabric, leading to trade execution times that can be less than 20 nanoseconds.1 A software-based approach running on a CPU could entail microseconds. Using the FPGA’s hardware programmability enables strategies to be devised at the hardware level.

In essence, this product represents a significant leap forward in enabling extreme speed trading strategies, making it an invaluable asset for those equipped to leverage its capabilities. During customer engagements, the primary consideration often revolves around the feasibility of integrating this technology, underscoring its specialised nature.

To what extent is the opportunity around low-latency trading dependent on technological factors outside a firm’s control? How are firms managing those limitations?

Hamid Salehi: The lower the jitter goes, the [more the] benefit of low-latency technology goes up. For a good number of exchanges across Europe and North America, low latency makes a material difference. For some exchanges – in the Asia-Pacific region, for example – it might not be as pronounced. Over time, the benefit can increase. We have customers that operate in multiple exchanges worldwide and, at certain exchanges, it makes a material difference today.

AMD customers also have satellite offices overseas using the same devices to ensure consistency of design and, in case the jitter goes down suddenly, they can be ready to reap the benefits of a more deterministic type of data feed on day one.

Looking to the future, how do you expect the market to develop? Are you already working on the next version of the card? How low can you go?

Hamid Salehi: AMD recently launched the second generation of the AMD Alveo UL3422 Accelerator Card, but customers want to know what we are doing beyond that, which is a testament to the success of the product line. We are always considering new solutions and technologies, and there is already a wealth of feedback from satisfied customers.

We view the electronic trading market as a continuum in which we want to keep innovating. We have been supporting this community for more than a decade. From a management perspective, there is executive support to make sure we keep investing in the electronic trading market. So you can expect to see more AMD products supporting ULLT in the coming years.
 

1 The 2024 AMD world record for latency is based on third-party testing commissioned by AMD and Exegy, by Strategic Technology Analysis Center (STAC®) in April 2024, using the STAC-T0 benchmark to test the AMD Alveo UL3524 Accelerator card powered by the AMD Virtex Ultrascale+ VU2P FPGA, running on the Exegy nxFramework and Exegy nxTCP-UDP-10g-ULL IP Core, in a Dell PowerEdge R7525 server with AMD EPYC 7313 processors. See the full STAC report. AMD previously held the world record for latency (2020). Stated results for the Alveo UL3524 Accelerator have been extrapolated to the AMD Alveo UL3422 card, based on identical silicon and product features (ALV-20).
2 Testing conducted by AMD in August 2023 using Vivado™ Design Suite 2023.1 running on Vivado Lab (Hardware Manager) 2023.1 and the GTF Latency Benchmark Design on the Alveo UL3524 Accelerator card, configured to enable GTF transceivers in internal near-end loopback mode. The benchmark was run 1,000 times with 250 frames per test, where PCS logic (PMA) of the transceiver passes data ‘as-is’ to FPGA fabric to measure latency. Stated latency result based on GTF transceiver ‘RAW mode’ and extrapolated to the AMD Alveo UL3422 card, based on identical silicon and product features. Configuration differences or variance in testing protocol will yield different results. Actual results will vary (ALV-10).
3 Based on a simulated comparison by AMD using the Synopsis VCS 2019.06-SP2 ultra-low latency Virtex UltraScale+ VU2P GTF transceivers and GTY transceivers in February 2022 (ALV-15).

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here