Diagram of NVIDIA DRA GPU driver for Kubernetes enabling fractional GPU allocation and sharing (MIG, MPS, time-slicing) for higher utilization

Advancing Open Source AI: NVIDIA DRA GPU driver for Kubernetes

By Agustin Giovagnoli / March 24, 2026

Kubernetes has become a primary platform for AI and ML, but its early GPU model treated accelerators as indivisible units. A new generation of device-aware scheduling led by Dynamic Resource Allocation is changing that. The NVIDIA DRA GPU driver for Kubernetes aligns with this shift by enabling more flexible allocation and sharing for higher utilization and lower cost per workload [1][2][3].

TL;DR: NVIDIA donates a DRA-based GPU driver to Kubernetes

GPU workloads on Kubernetes have typically relied on device plugins and vendor operators that expose GPUs as scalar resources, which can leave expensive accelerators underutilized. A DRA-based, open approach lets drivers participate directly in allocation, unlocking sharing strategies like time-slicing, MIG, and MPS to improve efficiency while staying consistent with upstream APIs [1][2][3]. For background on the API, see the Kubernetes DRA documentation (external).

What is Dynamic Resource Allocation (DRA) in Kubernetes?

DRA is an upstream framework that introduces pluggable, device-aware resource drivers to the scheduling loop. Unlike the traditional Device Plugin API that advertises whole GPUs as scalar resources, DRA allows drivers to make allocation decisions with richer semantics, which can include fine-grained sharing and topology-aware placement [2]. This enables cluster operators to align allocation policies with how AI workloads actually consume GPU resources [2].

Why whole-GPU allocation fails for AI workloads

When a GPU is exposed as an indivisible resource, pods often reserve more capacity than they use. The default scheduler sees only the requested GPU count, not real-time utilization, which leads to partial idling of high-end accelerators and higher costs across clusters [2][3]. Organizations running training or inference on A100-class GPUs feel this most, since static reservation leaves capacity stranded while fees accrue [2].

What the NVIDIA DRA GPU driver for Kubernetes brings

A DRA-based driver lets allocation become policy-driven and device-aware. Within Kubernetes, this approach supports fractional GPU allocation, scheduling with advanced topology constraints, and integration with GPU sharing models such as time-slicing, NVIDIA MPS, and MIG where appropriate [2]. By operating within the DRA framework, the model remains aligned with upstream APIs while enabling smarter placement and higher effective utilization [2].

GPU sharing mechanisms: MIG, MPS, and time-slicing

Time-slicing: Multiple pods can share a GPU in slices, improving throughput for bursty or latency-tolerant jobs [2].
NVIDIA MPS: Suitable for concurrent CUDA workloads, allowing safer, more efficient multi-tenant sharing under compatible conditions [2].
MIG: Hardware-partitioned instances on supported GPUs, useful when stronger isolation is required for predictable performance [2].

A DRA-aware driver can expose these capabilities to the scheduler so that pods land on nodes with the right sharing configuration and capacity profile [2].

How schedulers and plugins fit in (reclaiming underutilized GPUs)

Scheduling logic can incorporate utilization signals to reclaim or pack workloads on underused devices. CNCF community work shows scheduler plugins that identify idle GPU capacity and make placement decisions to reduce waste in multi-tenant clusters [3]. This complements DRA by pairing device-aware allocation with runtime feedback, and it can extend to batch and job schedulers such as Kueue or Volcano for larger distributed training jobs [2][3].

Integration with GPU Feature Discovery and node labeling

Detailed node labeling improves placement accuracy. GPU Feature Discovery adds fine-grained labels for GPU capabilities and topology, which helps the scheduler and drivers match workloads to nodes that can meet their requirements under DRA policies [1][2]. This creates a clearer contract between infrastructure and workload owners when combining sharing models with performance or isolation needs [1][2].

Business and operational impact: cost, utilization, portability

By shifting from whole-device reservation to driver-led allocation and sharing, clusters can raise GPU utilization and reduce idle capacity. For AI teams, that can translate into lower cost per training or inference job, while preserving portability across Kubernetes distributions that support DRA and compatible operators or schedulers [2][3]. For platform engineers, richer policies and utilization-aware placement reduce noisy-neighbor risks and increase predictability in multi-tenant environments [2][3].

Practical considerations for adoption

Kubernetes versioning and feature status: DRA is introduced as an alpha feature, so operators should validate versions and feature gates as they plan rollouts [2].
Compatibility with existing operators: Many environments already use the NVIDIA GPU Operator and device plugins; evaluate how DRA-based workflows interoperate and where migration is warranted [1][2].
Validate sharing models: Test time-slicing, MPS, and MIG configurations against real workloads to balance throughput, latency, and isolation [2].
Scheduling strategy: Combine DRA with scheduler plugins that can reclaim underutilized GPUs and coordinate with batch schedulers for large training jobs [2][3].
Documentation and playbooks: Build internal runbooks for safe multi-tenant policies and quota controls, and iterate via small pilots. You can also Explore AI tools and playbooks.

Sources

[1] Kubernetes and GPU: The Complete Guide to Running AI/ML Workloads at Scale
https://www.ajeetraina.com/kubernetes-and-gpu-the-complete-guide-to-running-ai-ml-workloads-at-scale/

[2] Kubecon 2024 Paris: Enhancing AI/ML Workloads in Kubernetes
https://blog.windkube.com/kubecon-gpu/

[3] Reclaiming underutilized GPUs in Kubernetes using scheduler plugins
https://www.cncf.io/blog/2026/01/20/reclaiming-underutilized-gpus-in-kubernetes-using-scheduler-plugins/