The Data Processing Unit (DPU) has gained significant attention and traction in recent months, having been identified as the coveted third socket in data centers alongside the CPU and GPU. As the innovators who coined the term in 2016, created its specifications and defined its value, it is gratifying to see our vision reverberating in the industry.
To filter out the noise around the DPU, we are kicking off a series of blog posts to offer a straightforward, no-frills description of the Fungible Data Processing Unit and its essential attributes. We’ll take a deep dive into every attribute and show how organizations of any size can see the compelling relevance and benefits of the Fungible DPU.
The Fungible Data Processing Unit: The Motivations Behind the Innovation
Cliche as it sounds, we live in a world of data. Modern applications that process, move and store voluminous amounts of data are no longer an exception, but the norm. Organizations are constantly evaluating data center technologies that can cope with the demands of application performance, infrastructure agility, security and reliability – all while keeping total cost of ownership in check.
To address these tough demands, organizations need to be able to extract the highest performance from every resource in the data center. Since applications and services increasingly span multiple servers, these distributed resources also need to work together – at scale – to deliver the most optimal characteristics to applications.
Simply put – the holy grail of data centers is one that has the ability to offer any resource in the data center to any service, at any time, and with every resource operating at its highest possible performance and utilization.
At Fungible, we have identified the root causes of the most nagging barriers in the data center and have worked to address them at the most fundamental levels.
1. Inefficient Data-centric Computational Performance
For years, the industry has discovered opportunities to improve overall performance by adding specialized silicon. Cryptographic accelerators were used to offload compute intensive operations from the CPU to improve overall performance. Without this class of technology, the industry would not have seen the pervasive proliferation of secure communication. More recently, organizations turned to GPUs to accelerate, high performance computing and machine learning workloads, as GPUs perform these specialized tasks much more efficiently than CPUs.
At Fungible, we have identified another class of computations that needed to be accelerated: data-centric computations. Data-centric computations are essentially computations that are needed to process, move and store data as efficiently, securely and reliably as possible in the data center. These computations are used in storage, network, security, virtualization datapaths, and include compute-intensive processes such as data reduction, data integrity, data filtering and analytics, data durability, data security while moving data.
Today, these computations are either handled by general purpose CPUs or offloaded to devices such as ASICs or FPGAs. General purpose CPUs, while flexible, are not efficient at handling data-centric computations. This is because CPUs are designed to maximize the instructions per clock (IPC) for each core, but are ill suited for executing network intensive (or IO-heavy) workloads that involve frequent context switching of fine-grained multiplexed stateful workloads. The figure below shows how CPU efficiency decreases when the I/O intensity of workloads increases.
Thus, when the CPUs are in the critical path in the servers, they inadvertently trap expensive resources such as storage and GPUs behind them. When resources are stranded, performance, scale and utilization are severely limited.
ASICs and FPGAs offer better efficiency in running data-centric computations compared to general purpose CPUs, but are not flexible or harder to program. We’ll explore these limited alternatives in another blog post.
2. Network limitations prevent high utilization of server resources
A scale-out architecture, comprising a large set of server resources working together and offering services to applications, have so far taken the form of hyperconverged infrastructure (HCI). Composable disaggregated infrastructure (CDI) is an emerging alternative (read this blog) which shows great promise architecturally, but has been elusive as it is challenging to realize with performance at scale.
For both HCI and most importantly for CDI, network issues impede scale out efficiencies. Existing solutions have been mired with congestion challenges that result in network over-provisioning and limited scalability.
What is needed is a data hub at every server node that can provide a massively scalable, any-to-any, high bandwidth, deterministic low latency fabric that can present remote resources on the network like they are local to the application. Such an ideal fabric would enable true disaggregation and pooling at scale, ultimately enabling higher utilization and lower footprint and cost.
Previous industry attempts have failed to come up with a unified solution that provides all the necessary properties in a single network, as most are bogged down by scalability issues, congestion and deadlocks.
3. Performance-Programmability Tradeoff
The tech industry is energized by innovations that happen on a continuous basis. Organizations are constantly looking to do better – in this context, to make continuous improvements to their datapaths, be it for performance, security, reliability, features and so on. Thus, flexibility is key.
However, there is a well established tug-of-war that exists between flexibility and performance. General purpose CPUs are flexible and programmable, but cannot offer the highest levels of performance for data-centric computations. On the other end of the spectrum, ASICs perform quite well for the computations they are tailored for, but are not flexible. There have been attempts to amalgamate datapath ASICs and embedded CPUs into SOCs, but the architectures being distinctly disparate, result in performance cliffs when computations are punted from the hardwired datapath to the CPUs, which in a fast evolving cloud environment, they often are.
An ideal solution is therefore, one which breaks the performance-programmability conundrum and offers a no-compromise, high performance solution that is fully flexible and easily programmable.
The Fungible Data Processing Unit: What’s In The Name
Specifically designed to address these three challenges, the Fungible DPU:
- Is capable of running data-centric computations an order of magnitude more efficiently than general-purpose CPUs.
- Enables data center-scale composable disaggregated infrastructure (supporting tens of thousands of server nodes), liberating expensive resources from being trapped behind CPUs. The Fungible DPU implements the end point of a high performance TrueFabric™ that provides deterministic low latency, full cross-section bandwidth, congestion and error control, and high security across a wide scale of deployments.
- Offers true data-path programmability through an architecture that allows fine-grained feature and performance optimization for different data-centric computations. Programmability is offered at the infrastructure software level using high-level programming languages and no changes in application code are necessary.
The Fungible Data Processing Unit : The Third Socket In Data Centers
The coveted position of the third socket in data centers is one that is not taken lightly. Sitting at the intersection of compute, storage, and networking, the Fungible DPU is designed to be the data-path hub in all data center server types, accelerating data centric computations and enabling efficient, secure and reliable data interchange. The Fungible DPU also serves as the root-of-trust for each server, ensuring that only authenticated code (signed binaries) are executed and providing added protection from malware and other threats.
The Fungible DPU can be deployed in a multitude of use cases including storage (initiator as well as target), AI/analytics servers, Layer 4 to Layer 7 systems, Virtualization, security appliances, NFV applications, etc.