In the previous article on data center networks, we discussed the evolution of data center networking technology that weaves together hyperscale clouds, enterprise and edge data centers. We are seeing a rapid upgrade in Ethernet data rates from 25 Gbps to 100 Gbps to 200/400 Gbps and 800 Gbps, with the high end of this spectrum driven by hyperscale data centers managing petabytes of information across thousands of servers. The exponential growth in Ethernet bandwidth has led to a rethinking of data center hardware architecture, resulting in a growing interest in SmartNIC technology. In this article, we will take a closer look at a SmartNIC.
It is important to note that a SmartNIC is also referred to as a Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or Distributed Services Card (DSC), depending on the vendor and product specialization. The term "SmartNIC" will refer to all four terms throughout the rest of this article. It is an advancement of the network interface card (NIC), which connects servers to the network. NICs are part of the traditional data center infrastructure resources: CPU, memory, storage, and networking components. A NIC requires additional capabilities to become a SmartNIC: In the most general definition, it adds programmable computing resources (such as a CPU). There are different levels of programmability and very different types of computing resources in today's SmartNICs, some of which we will discuss below. Before we do, we will focus on why SmartNICs have become so popular recently.
There are several specific reasons why network developers and operators use SmartNICs. From our perspective, they can be grouped into three main categories of use cases:
In addition to the hardware infrastructure and the actual user application workloads, there is a large stack of data center infrastructure functions such as data storage functions, hypervisors, routing, load balancing, and security functions such as packet encryption and inspection. These functions handle all the network packet processing that is not performed by the traditional NIC. When implemented in software, these infrastructure tasks consume CPU cycles that are then unavailable for user applications. A term you often hear is "infrastructure tax": about 30% of CPU cycles are consumed by infrastructure tasks in the data center. This equates to one new server for nearly every two servers in production, driving up the total cost of ownership (TCO). Because of this overhead, the view has formed that server CPUs are no longer the platform to perform such infrastructure functions. The SmartNIC takes over the infrastructure tasks and offloads the server, freeing up revenue-generating compute cycles for user applications.
Software frameworks such as DPDK enable server CPUs to perform network functions in an efficient manner. However, as Ethernet speeds continue to increase, it becomes increasingly difficult for general-purpose CPUs to keep up with network data rates. Take IPsec, a widely used network security function, as an example: while processing the computing power required by IPsec for a 100 Gbps full-duplex connection already consumes almost all of the CPU resources of a typical data center server (which brings us back to the topic of offload), IPsec at 400 or 800 Gbps exceeds its capabilities. (Some) SmartNICs are able to maintain a processing throughput to run such workloads at line speed.
In addition to throughput, several SmartNIC solution providers specialize in ultra-low latency network functions, which are important in financial transactions, for example. Specialized processors in SmartNIC, unlike general server CPUs, can meet such latency requirements.
SmartNICs can separate the execution of networking, storage and security functions from the server execution environments. This has a dual benefit: On the one hand, network functions run efficiently on SmartNICs processors because they can be separated from other compute-intensive workloads on the server that may compete for resources. On the other hand, the separation can provide additional security, since compromising the network function is much more difficult or even impossible.
The programmable computing resources required on the SmartNIC to serve these use cases fall into several categories. Different vendors offer products with different categories of computing resources:
* Collections of general-purpose CPU cores: Many SmartNICs contain a complex of ARM cores. There are also products that host other CPUs, such as Xeon processors, on the card. This brings an advantage: Porting software to a SmartNIC with general-purpose CPUs is often easier than porting to more specialized processors. However, many network functions demand high processing power and require more domain-specific processors to run at line speeds.
* Application-specific integrated circuits (ASICs) or hard-wired chiplets to accelerate fixed functions: Many SmartNICs have on-board accelerators for fixed functions, often in addition to more general-purpose computing resources. A typical example is silicon for standard cryptographic functions.
* Flow processing cores, custom designed network processors: These are in the spectrum between general purpose CPUs and ASICs. They are programmable (to some degree), but more specialized than general-purpose CPUs. As a result, they can provide higher computing power. Flow-processing cores are often programmed using the domain-specific language P4.
* Field Programmable Gate Arrays (FPGAs): FPGAs also occupy the space between fixed-function accelerators and general-purpose CPUs. They are programmable and offer much finer adaptation to the network function task, but the development environment is typically more complex than the other three resource types. FPGAs now scale to millions of logic units and have matured to the point where they can provide the computing power to become a foundational technology for SmartNICs.
Many SmartNIC products contain combinations of different computing resources. With these, we can offload data processing tasks to the NIC in addition to basic network functions. We have discussed the overall value proposition of this technology in three use case categories. In the next article, we will discuss typical use cases and the specific value proposition of SmartNICs in these in more detail. Stay tuned.
As 2023 comes to a close, it's time to reflect on a year filled with remarkable achievements and groundbreaking developments at Xelera Technologies. Join us in this recap while we explore the major milestones, technical advancements and team and community engagement that defined Xelera's dynamic journey throughout the year.
In the era of data-driven decisions, every microsecond counts. In this article "Ultra-low Latency XGBoost with Xelera Silva", we discuss the optimization of xgboost, lightgbm and catboost for lightning-fast machine learning inference.
As electronic trading strategies that support machine learning proliferate, the speed at which machine learning algorithms can make decisions is once again becoming one of the critical factors in differentiating oneself from the competition.