In the world of high-frequency trading (HFT), where every microsecond counts, deploying machine learning models for real-time decision-making presents a significant challenge. Traditional CPU-based inference engines struggle to meet the stringent latency requirements—typically below few microseconds with minimal variance—necessary for effective trading strategies.
The Latency Problem in ML Inference
Traditional CPU-based inference solutions fail to meet the stringent latency demands of HFT. While CPU-optimized frameworks (e.g., Intel oneDAL and TL2cgen) can reduce inference time, they introduce variability, with occasional latency spikes exceeding acceptable thresholds. This inconsistency makes them unsuitable for mission-critical trading applications.
LightGBM, a widely used gradient boosting framework, provides excellent predictive accuracy for trading models. However, running inference on CPUs introduces considerable latency spikes, making it unsuitable for the real-time execution required in HFT. This inconsistency makes them unsuitable for mission-critical trading applications.
The Xelera Silva + ICC Solution
Xelera Silva, in conjunction with an ICC server, solves this problem by offloading ML inference to an FPGA-based accelerator. The latest benchmark results on ICC VEGA 116I and ICC VEGA R-118i servers demonstrate Silva’s ability to deliver single digit microsecond median latency while eliminating the high-latency spikes seen in CPU-based solutions.
Benchmark Insights
· On ICC VEGA 116I (Intel Core i9-14900KS), Silva achieves a median latency of 1.128µs for small models and 1.174µs for large models, with 99th percentile latencies of just 1.328µs and 1.385µs, respectively .
· On ICC VEGA R-118i (Intel Xeon w7-2495x), similar results were recorded: 1.131µs median latency for small models and 1.237µs for large models, with a 99th percentile under 1.3µs.
· Compared to CPU-based frameworks, which exhibit latencies between 26µs and 163µs, Silva offers a more than 10x improvement in speed and consistency.
Why This Matters for HFT
By leveraging FPGA acceleration with Xelera Silva, traders can execute AI-driven strategies without compromising the ultra-low latency advantages of traditional HFT systems. The integration with ICC servers ensures high performance, stability, and minimal jitter, enabling firms to make split-second trading decisions with confidence and precision.
For more details, check our benchmark report:
Fore more information about ICC, click here.