GPUs are powerful processors increasingly used in Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning tasks. They excel at deriving near real-time answers from massive pools of data. GPUs break complex problems into thousands or millions of separate tasks and process them simultaneously i.e. in parallel. They are becoming an essential element of many organizations’ digital transformation. The 2021-2027 Global CAGR for GPU adoption is predicted to be 32%, or roughly 200 Billion US Dollars, making it a high growth market. GPUs will be a priority tool for organizations that want increased performance in their AI and ML tasks.
As the number of GPUs in the market grows, it’s important to understand the challenges data centers will face as they implement GPU solutions at scale.
The traditional GPU allocation model is a rigid approach. GPU resources are locked into an individual physical server form factor in a 1:1 relationship. In this model, GPU processing power is bound to a single processor type and restricted by the physical limitations of individual server systems and their designs. Server compute resources are cannibalized to manage the GPU and software, which results in data centers having to over-provision expensive CPU resources in order to accommodate GPU management and functions. At an individual server level, this overhead may not be as noticeable, but as scale is introduced, the inefficiencies become profound. This drives up the cost of acquisition, and complexity in management, and creates an imbalance between compute and GPU resources. These issues result in stranded, over-provisioned and underutilized GPU resources data centers struggle to manage and use in a timely manner.
This rigid 1:1 approach does not allow for the flexibility of matching CPU and GPU resources efficiently or effectively when workloads change, or if new CPU or GPU processors need to be introduced. If a user needs a different type of GPU, or wishes to allocate additional compute resources to GPU workloads, they must physically re-create the system, or acquire new servers and move GPUs from host to host, resulting in increased cost, complexity, and downtime. The ability to dynamically connect GPU and CPU resources is a cumbersome physical process that most organizations are loath to take on.
Since GPU resources come with a high cost of acquisition, they tend to be one of the most expensive resources that reside in a physical server in the data center. Locking their capabilities to a rigid manual deployment model results in significant levels of inefficiency.
For example, a Service Provider who wishes to deliver a GPU as a Service offering is forced into the dilemma of acquiring expensive GPU processors in anticipation of being able to match them to the proper customers or workloads. At the same time, if customer requirements change, the Service Provider does not have the ability to seamlessly shift GPU resources from one tenant to another without incurring significant downtime and expense. This is a costly exercise that limits the provider’s ability to address customers’ need for flexibility and ultimately limits profitability for the Service Provider.
At Fungible we believe there is a better way to solve the dilemma presented by the current methods for GPU deployment and management. Fungible GPU-Connect is a platform allowing organizations to radically change how they manage, implement, and support expensive GPU assets in their data centers.
Starting with a 4U chassis that supports 8 GPU devices, Fungible GPU-Connect creates a GPU resource pool allowing for the perfect matching of compute to GPU resources without the need to physically alter their existing infrastructure. Leveraging commodity ethernet, and Fungible Accelerator Cards in the data center’s server instances, GPUs are connected via ethernet and presented as a local asset in the data center server. This creates a 1:many relationship between GPUs and servers rather than the traditional rigid 1:1 approach. By taking a composable infrastructure approach, data centers can create GPU-enabled infrastructure unlike any they have seen before.
The process for implementation is simple. Remove GPUs from existing server infrastructure or acquire new ones. Fungible Acceleration Cards are introduced to any compute asset that needs GPU resources. Move the GPUs in a pool inside a GPU-Connect platform and present to any server across the network.
Using Fungible Composer Software, any CPU can be matched with any GPU in the data center without physical intervention, and across standard Ethernet networks. Even better, with the Fungible DPU technology that comes standard with Fungible GPU-Connect, data centers can achieve local GPU performance across standard ethernet connections, without degradation.
Fungible GPU-Connect provides the following benefits to a data center:
- Improved utilization of available GPU resources, driving incremental costs down and revenues up.
- Simplifying GPU refresh and GPU allocations based on workload requirements – dynamically.
- Delivering GPU processing power independent of the application’s CPU requirements
- Speeds up new customer onboarding and workloads
- Enables allocation and billing of GPU resources at more granular time increments
- Provides more GPUs to a workload than can be physically installed within a server
- Simplifies GPU allocation and management onto a single platform
As The Composable Infrastructure Company, we at Fungible continue to solve the problems legacy data center constructs to face. Fungible GPU-Connect is just one of the many solutions that will revolutionize how organizations seek to solve the ongoing challenges created by the growing demand for AI/ML and Edge computing.