Time-slicing
Time-slicing lets multiple containers share one physical GPU by scheduling their CUDA contexts in turns. CUDA preempts each context at a configurable interval and switches to the next one, similar to CPU time-sharing on a single-core machine.
Use time-slicing when you want to increase GPU utilization across workloads that do not run continuously — for example, multiple batch jobs or development containers that would otherwise sit idle waiting for their own dedicated GPU.
Feature status
TimeSlicingSettings is an Alpha feature gate, disabled by default.
| Feature gate | Default | Stage |
|---|---|---|
TimeSlicingSettings | false | Alpha |
See the feature gates reference for all available gates.
Prerequisites
- The DRA Driver for NVIDIA GPUs must be installed. See Installation.
- The
TimeSlicingSettingsfeature gate must be enabled. See Enabling the feature.
Enabling the feature
Enable the TimeSlicingSettings feature gate with helm upgrade:
helm upgrade dra-driver-nvidia-gpu oci://registry.k8s.io/dra-driver-nvidia/charts/dra-driver-nvidia-gpu \
--namespace dra-driver-nvidia-gpu \
--set featureGates.TimeSlicingSettings=true
The GPU kubelet plugin and webhook must both restart for the change to take effect. The rolling update happens automatically when you upgrade the Helm release.
Configure time-slicing
Configuring time-slicing requires two steps:
- Create a
ResourceClaimTemplatethat specifies the time-slice interval. - Create pods that reference the template.
A ResourceClaimTemplate is namespace-scoped. Create one in each namespace
where you want to use time-slicing. Within a namespace, create a separate
template for each interval configuration you need — for example, one template
for interval: Long batch workloads and another for interval: Short
interactive workloads. Pods that need the same interval share the same template.
Time-slice intervals
The interval field controls the CUDA time-slice duration:
| Value | Description |
|---|---|
Default | Uses the NVIDIA GPU driver’s built-in default interval |
Short | Shorter interval; contexts switch more frequently |
Medium | Intermediate interval |
Long | Longer interval; each context runs longer per turn before being preempted |
If omitted, Default is used.
CEL selectors
You can add a selectors block under exactly to target a specific GPU by
model, UUID, or other attribute. CEL-based device selection is a standard
Kubernetes DRA feature; see the
Kubernetes DRA documentation
for details on the selector syntax and available attributes.
Time-slicing example
Create a ResourceClaimTemplate
A ResourceClaimTemplate defines the GPU request and its configuration.
Multiple pods can reuse the same template: Kubernetes automatically creates one
ResourceClaim per pod from it, and deletes that claim when the pod terminates.
Create a file called
shared-gpu.yaml:apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: namespace: time-slicing-example name: shared-gpu spec: spec: devices: requests: - name: gpu exactly: deviceClassName: gpu.nvidia.com config: - requests: ["gpu"] opaque: driver: gpu.nvidia.com parameters: apiVersion: resource.nvidia.com/v1beta1 kind: GpuConfig sharing: strategy: TimeSlicing timeSlicingConfig: interval: LongThe
deviceClassName: gpu.nvidia.comis required — it selects a full GPU. Setintervalto the time-slice duration you need. See Time-slice intervals for the available values.Apply the manifest:
kubectl apply -f shared-gpu.yamlExample output:
resourceclaimtemplate.resource.k8s.io/shared-gpu created
Create a Pod that references the ResourceClaimTemplate
Reference the ResourceClaimTemplate by name in pod.spec.resourceClaims.
Kubernetes creates one ResourceClaim per pod when it is scheduled.
To share a single GPU across containers in the same pod, each container
references the same request name (request: gpu) within the claim. If
containers referenced different request names, each would receive a separate
GPU.
Create a file called
time-slicing-pod.yaml:apiVersion: v1 kind: Pod metadata: namespace: time-slicing-example name: time-slicing-pod spec: containers: - name: workload-0 image: <your-image> resources: claims: - name: shared-gpu request: gpu - name: workload-1 image: <your-image> resources: claims: - name: shared-gpu request: gpu resourceClaims: - name: shared-gpu resourceClaimTemplateName: shared-gpu tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule"Key fields:
image— replace<your-image>with your workload container image.resourceClaimTemplateName— must match the name of theResourceClaimTemplateyou created in the previous step.request: gpu— must match the request name defined in the template. Both containers using the same value is what causes them to share one GPU.- Toleration — allows the pod to schedule on nodes that have the
nvidia.com/gpu: NoScheduletaint, which is common on GPU nodes. Remove it if your cluster does not use this taint.
Apply the manifest:
kubectl apply -f time-slicing-pod.yamlExample output:
pod/time-slicing-pod createdVerify the pod is running and both containers are ready:
kubectl get pod -n time-slicing-example time-slicing-podExample output:
NAME READY STATUS RESTARTS AGE time-slicing-pod 2/2 Running 0 30sConfirm both containers see the same GPU:
kubectl exec -n time-slicing-example time-slicing-pod -c workload-0 -- nvidia-smi -L kubectl exec -n time-slicing-example time-slicing-pod -c workload-1 -- nvidia-smi -LExample output:
GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-2fa81118-5a5f-aa66-7660-471eed407181) GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-2fa81118-5a5f-aa66-7660-471eed407181)Both commands return the same GPU UUID, confirming the containers share one device.
For additional examples, including time-slicing with CEL selectors, see the
demo/specs/
directory in the repository.
Limitations and considerations
- No memory isolation. All containers sharing a GPU access the same GPU memory. A container that allocates more memory than expected can affect other containers on the same device.
- No throughput guarantees. The GPU is shared on a best-effort basis. Workloads can observe variable performance depending on what else is running on the same GPU.
- Not supported on MIG slices. Setting
strategy: TimeSlicingin aMigDeviceConfigis accepted without error but has no effect on hardware. To share a MIG slice across containers, use MPS instead. - Mutually exclusive with MPS on the same GPU. Time-slicing sets the GPU compute mode to DEFAULT; MPS requires EXCLUSIVE_PROCESS. Both strategies can be active in a cluster, but not on the same physical GPU at the same time.