Prerequisites
Cluster, software, and hardware requirements for the DRA Driver for NVIDIA GPUs.
Tip: Most of these prerequisites can be installed and managed for you by the NVIDIA GPU Operator.
| Requirement | Version / Notes |
|---|---|
| Kubernetes | v1.34.2 or later, with at least one node that has one or more NVIDIA GPUs. |
DynamicResourceAllocation feature gate |
Enabled by default in Kubernetes v1.34+. On v1.32 and v1.33, enable it manually. |
| Helm | v3.8 or later. |
| NVIDIA Driver | v565 or later for GPU allocation. v570.158.01 or later if using ComputeDomains. |
| CDI | Enabled in your container runtime. This is enabled by default in containerd 2.0+ and CRIO v1.27+. The DRA Driver uses CDI to expose GPUs to containers. |
| Node Feature Discovery (NFD) | Labels GPU nodes in the cluster. The DRA Driver uses these labels to target the GPU kubelet plugin to the correct nodes. |
ComputeDomains additional prerequisites
If you plan to use ComputeDomains, you also need:
- NVIDIA Driver v570.158.01 or later. The
IMEXDaemonsWithDNSNamesfeature gate is enabled by default and requires this driver version. The ComputeDomain plugin will fail to start on older drivers unlessIMEXDaemonsWithDNSNamesis explicitly disabled. - Multi-Node NVLink (MNNVL) hardware. Nodes must be connected via NVLink fabric, such as GB200 NVL72 and similar systems.
- GPU Feature Discovery (GFD) deployed via the GPU Operator. GFD generates the
nvidia.com/gpu.cliquenode labels required by ComputeDomains. - On all GPU nodes where the
nvidia-imex-*packages are installed, thenvidia-imex.servicesystemd unit must be disabled:
systemctl disable --now nvidia-imex.service && systemctl mask nvidia-imex.service
Install prerequisites with NVIDIA GPU Operator
The NVIDIA GPU Operator is a Kubernetes operator that automates the deployment and lifecycle management of all NVIDIA software components needed to provision and monitor GPUs in a cluster.
It can manage the following DRA Driver for NVIDIA GPUs prerequisites for you:
- NVIDIA Driver (v565+ for GPU allocation, v570.158.01+ for ComputeDomains). The GPU Operator installs a default driver that meets the DRA Driver's prerequisites. To use a specific version, see Common chart customization options in the GPU Operator documentation.
- CDI enabled through the NVIDIA Container Toolkit.
- Node Feature Discovery (NFD).
- GPU Feature Discovery (GFD), required for ComputeDomains.
If you choose to install the GPU Operator, follow the DRA Driver for NVIDIA GPUs install guide in the GPU Operator documentation. It covers installing the GPU Operator with the NVIDIA Kubernetes Device Plugin disabled and installing the DRA Driver for NVIDIA GPUs.
Enable DRA on Kubernetes v1.32 and v1.33
On Kubernetes v1.34 and later, DynamicResourceAllocation is enabled by default and no additional configuration is required.
On Kubernetes v1.32 and v1.33, enable the following on each component:
| Component | Requirement |
|---|---|
| kube-apiserver | Enable the DynamicResourceAllocation feature gate and the resource.k8s.io/v1beta1 API group (available on v1.32 and v1.33). On v1.33, also enable resource.k8s.io/v1beta2. |
| kube-controller-manager | Enable the DynamicResourceAllocation feature gate |
| kube-scheduler | Enable the DynamicResourceAllocation feature gate |
| kubelet | Enable the DynamicResourceAllocation feature gate |
How you apply these depends on your cluster setup. For managed Kubernetes distributions (EKS, GKE, AKS, and others), refer to your provider's documentation. Not all providers support enabling DynamicResourceAllocation on v1.32 or v1.33 clusters.
Example: kubeadm
The following kubeadm-init.yaml enables DRA for a new cluster using kubeadm:
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
apiServer:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
- name: "runtime-config"
# On v1.32, omit "resource.k8s.io/v1beta2=true"
value: "resource.k8s.io/v1beta1=true,resource.k8s.io/v1beta2=true"
controllerManager:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
scheduler:
extraArgs:
- name: "feature-gates"
value: "DynamicResourceAllocation=true"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
DynamicResourceAllocation: true