Skip to content
Markdown

Manifest: GPU time-slicing

Scope: the NVIDIA k8s device-plugin ConfigMap (replicas under sharing.timeSlicing.resources), wiring it through the GPU Operator via devicePlugin.config, the manual restart ceremony the Operator does not do for you, and the noisy-neighbour / no-memory-isolation caveat versus MIG. Verify node allocatable shows the multiplied nvidia.com/gpu count.

Reference template from upstream NVIDIA GPU Operator and k8s-device-plugin docs. Not hardware-tested here. Pin the GPU Operator chart and apply via GitOps; choose this sharing model deliberately. It is oversubscription, not isolation. Builds on Kubernetes & Helm: GPU Platform §3.

flowchart LR
  CM["ConfigMap<br/>sharing.timeSlicing.replicas=N"] --> WIRE["devicePlugin.config.name/.default"]
  WIRE --> RESTART["rollout restart device-plugin DS"]
  RESTART --> ADV["node advertises N x nvidia.com/gpu"]
  ADV --> WARN["shared context: no memory isolation"]

What it is

Time-slicing advertises a single physical GPU as replicas logical units of nvidia.com/gpu, letting that many pods land on one GPU and share it by the GPU's time-sliced context switch. There is no memory partition and no fault isolation: every replica shares one frame buffer, so one pod's allocation can OOM the others, and a hung kernel stalls the lot. Replicas oversubscribe the resource; they do not grant proportional compute. This is the cheap, dev/bursty/inference-batching sharing model. When you need a hardware memory and engine partition, use MIG instead; for software-managed concurrent contexts with per-client memory limits, MPS sits between the two.

The mechanism lives entirely in the NVIDIA k8s device plugin; the GPU Operator just delivers the plugin's config from a ConfigMap. The Operator does not watch that ConfigMap. Editing it in place is silent until you restart the plugin.

Prerequisites

  • A working GPU Operator install with devicePlugin.enabled=true and nodes already advertising nvidia.com/gpu (the smoke pod in the hub passes).
  • mig.strategy left at none/single on these nodes. Time-slicing on top of MIG devices is possible (flags.migStrategy: mixed, resource name nvidia.com/mig-<profile>) but out of scope here.
  • Decide failRequestsGreaterThanOne up front. NVIDIA recommends true so a request for >1 replica fails fast (the slices are not interchangeable with whole GPUs); it defaults false for backward compatibility.
  • kubectl access to the gpu-operator namespace and the node labels.

The manifest

The ConfigMap key is an arbitrary profile name (any is the conventional cluster-wide default). version: v1 and the sharing.timeSlicing block are the device-plugin config schema.

# time-slicing-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: gpu-operator
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false          # true -> advertise nvidia.com/gpu.shared instead
        failRequestsGreaterThanOne: true # requests for >1 replica are rejected (recommended)
        resources:
          - name: nvidia.com/gpu
            replicas: 4                  # 1 physical GPU -> 4 schedulable units, SHARED memory

Multi-profile ConfigMap (different replica counts per node class), selected per node by label:

# time-slicing-config-fine.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config-fine
  namespace: gpu-operator
data:
  a100-40gb: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: true
        resources:
          - name: nvidia.com/gpu
            replicas: 8
  tesla-t4: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: true
        resources:
          - name: nvidia.com/gpu
            replicas: 4

Wire it into the GPU Operator. On a fresh release, pass it at install (pin the chart):

helm upgrade --install gpu-operator nvidia/gpu-operator \
  -n gpu-operator --create-namespace \
  --version <pinned> \
  --set devicePlugin.config.name=time-slicing-config \
  --set devicePlugin.config.default=any

On a running cluster, create the ConfigMap then patch the ClusterPolicy (this is what the --set flags render to):

kubectl apply -f time-slicing-config.yaml
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
  -n gpu-operator --type merge \
  -p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config","default":"any"}}}}'

For the multi-profile ConfigMap, omit default and pick the key per node:

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
  -n gpu-operator --type merge \
  -p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config-fine"}}}}'
kubectl label node <gpu-node> nvidia.com/device-plugin.config=tesla-t4 --overwrite

Configuration

Field / flag Where Meaning
sharing.timeSlicing.resources[].name ConfigMap data Resource to oversubscribe — nvidia.com/gpu (or nvidia.com/mig-<profile> under MIG).
sharing.timeSlicing.resources[].replicas ConfigMap data Oversubscription factor. 1 physical GPU is advertised replicas times. No memory split.
renameByDefault ConfigMap data true advertises nvidia.com/gpu.shared (pods must request that name) so whole-GPU and shared workloads coexist. Default false.
failRequestsGreaterThanOne ConfigMap data true rejects pod requests for >1 replica. Recommended true; default false.
flags.migStrategy ConfigMap data none for whole-GPU time-slicing; mixed to time-slice MIG devices.
devicePlugin.config.name Helm / ClusterPolicy Name of the ConfigMap the plugin reads.
devicePlugin.config.default Helm / ClusterPolicy Which ConfigMap key applies cluster-wide when a node has no override.
nvidia.com/device-plugin.config Node label Per-node override selecting a ConfigMap key; wins over default.
nvidia.com/gpu.replicas Node label (set by plugin) Read-only signal: the advertised replica factor.
nvidia.com/gpu.product Node label (set by plugin) Suffixed -SHARED when sharing is active (with renameByDefault: false).

Apply & verify

The Operator does not monitor the ConfigMap. After any create/edit, restart the device-plugin DaemonSet by hand:

kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset
kubectl rollout status  -n gpu-operator daemonset/nvidia-device-plugin-daemonset

Expected signal: allocatable shows the multiplied count (4 replicas on a 1-GPU node -> 4):

kubectl get node <gpu-node> \
  -o jsonpath='{.status.allocatable.nvidia\.com/gpu}{"\n"}'
# 4

Cluster-wide sweep and the plugin-set labels:

kubectl get nodes -l nvidia.com/gpu.present=true \
  -o custom-columns='NODE:.metadata.name,GPU_ALLOC:.status.allocatable.nvidia\.com/gpu'

kubectl get node <gpu-node> \
  -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.replicas}{"\n"}'   # 4
kubectl get node <gpu-node> \
  -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.product}{"\n"}'    # ...-SHARED

Functional check: schedule more pods than there are physical GPUs and confirm co-residency:

# two pods, one physical GPU, both Running
apiVersion: v1
kind: Pod
metadata: { name: ts-a, namespace: gpu-operator }
spec:
  restartPolicy: Never
  containers:
    - name: smi
      image: nvidia/cuda:13.0.0-base-ubuntu24.04
      command: ["sh", "-c", "nvidia-smi -L && sleep 60"]
      resources: { limits: { nvidia.com/gpu: 1 } }
kubectl apply -f ts-a.yaml
sed 's/ts-a/ts-b/' ts-a.yaml | kubectl apply -f -
kubectl get pods -n gpu-operator -l '!app' -o wide      # ts-a, ts-b both Running on the same node
kubectl logs ts-a -n gpu-operator                       # nvidia-smi -L prints the same physical UUID as ts-b

If renameByDefault: true, both pods must instead request nvidia.com/gpu.shared: 1, and allocatable for nvidia.com/gpu drops to 0 on shared nodes.

Failure modes

  • Edited the ConfigMap, nothing changed. The Operator does not watch it. Allocatable stays at 1 until you rollout restart the nvidia-device-plugin-daemonset. This is the single most common miss.
  • Allocatable still 1 after restart. devicePlugin.config.default is unset (or the node's nvidia.com/device-plugin.config label points at a missing key), so the plugin fell back to no sharing. Patch default or fix the node label.
  • Treating replicas as isolated GPUs. Pods OOM each other: one process allocates the whole frame buffer, the rest fail CUDA allocations. There is no per-replica memory cap; size workloads to share one GPU's memory, or move to MIG. See security & multi-tenancy: time-slicing is not a tenant boundary.
  • Noisy neighbour / latency spikes. Compute is round-robin time-sliced, not partitioned; a heavy kernel starves co-resident pods. Acceptable for dev and batched inference, not for latency-SLO serving.
  • renameByDefault: true but pods still request nvidia.com/gpu. They go Pending (that resource is now 0); switch requests to nvidia.com/gpu.shared.
  • Requested nvidia.com/gpu: 2 with failRequestsGreaterThanOne: true. Admission rejects it by design. Slices are not whole GPUs. Request 1 per pod.
  • MIG node with migStrategy: none. The plugin advertises nothing useful; under MIG use migStrategy: mixed and the nvidia.com/mig-<profile> resource name. Out of scope here; see MIG partitioning.

References

  • Time-Slicing GPUs in Kubernetes (GPU Operator): https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html
  • GPU Operator getting started / Helm values: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
  • NVIDIA k8s device plugin (config schema, sharing.timeSlicing): https://github.com/NVIDIA/k8s-device-plugin
  • Time-slicing on OpenShift (same ConfigMap, restart ceremony): https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/time-slicing-gpus-in-openshift.html
  • MIG vs time-slicing trade-offs (this KB): MIG

Related: GPU Platform hub · Kubernetes for GPUs · MIG partitioning · Dynamic & fractional sharing · Security & multi-tenancy · Telemetry · Glossary