Markdown

Manifest: GPU time-slicing¶

Scope: the NVIDIA k8s device-plugin ConfigMap (replicas under sharing.timeSlicing.resources), wiring it through the GPU Operator via devicePlugin.config, the manual restart ceremony the Operator does not do for you, and the noisy-neighbour / no-memory-isolation caveat versus MIG. Verify node allocatable shows the multiplied nvidia.com/gpu count.

Reference template from upstream NVIDIA GPU Operator and k8s-device-plugin docs. Not hardware-tested here. Pin the GPU Operator chart and apply via GitOps; choose this sharing model deliberately. It is oversubscription, not isolation. Builds on Kubernetes & Helm: GPU Platform §3.

flowchart LR
  CM["ConfigMap<br/>sharing.timeSlicing.replicas=N"] --> WIRE["devicePlugin.config.name/.default"]
  WIRE --> RESTART["rollout restart device-plugin DS"]
  RESTART --> ADV["node advertises N x nvidia.com/gpu"]
  ADV --> WARN["shared context: no memory isolation"]

What it is¶

Time-slicing advertises a single physical GPU as replicas logical units of nvidia.com/gpu, letting that many pods land on one GPU and share it by the GPU's time-sliced context switch. There is no memory partition and no fault isolation: every replica shares one frame buffer, so one pod's allocation can OOM the others, and a hung kernel stalls the lot. Replicas oversubscribe the resource; they do not grant proportional compute. This is the cheap, dev/bursty/inference-batching sharing model. When you need a hardware memory and engine partition, use MIG instead; for software-managed concurrent contexts with per-client memory limits, MPS sits between the two.

The mechanism lives entirely in the NVIDIA k8s device plugin; the GPU Operator just delivers the plugin's config from a ConfigMap. The Operator does not watch that ConfigMap. Editing it in place is silent until you restart the plugin.

Prerequisites¶

A working GPU Operator install with devicePlugin.enabled=true and nodes already advertising nvidia.com/gpu (the smoke pod in the hub passes).
mig.strategy left at none/single on these nodes. Time-slicing on top of MIG devices is possible (flags.migStrategy: mixed, resource name nvidia.com/mig-<profile>) but out of scope here.
Decide failRequestsGreaterThanOne up front. NVIDIA recommends true so a request for >1 replica fails fast (the slices are not interchangeable with whole GPUs); it defaults false for backward compatibility.
kubectl access to the gpu-operator namespace and the node labels.

The manifest¶

The ConfigMap key is an arbitrary profile name (any is the conventional cluster-wide default). version: v1 and the sharing.timeSlicing block are the device-plugin config schema.

# time-slicing-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: gpu-operator
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false          # true -> advertise nvidia.com/gpu.shared instead
        failRequestsGreaterThanOne: true # requests for >1 replica are rejected (recommended)
        resources:
          - name: nvidia.com/gpu
            replicas: 4                  # 1 physical GPU -> 4 schedulable units, SHARED memory

Multi-profile ConfigMap (different replica counts per node class), selected per node by label:

# time-slicing-config-fine.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config-fine
  namespace: gpu-operator
data:
  a100-40gb: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: true
        resources:
          - name: nvidia.com/gpu
            replicas: 8
  tesla-t4: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: true
        resources:
          - name: nvidia.com/gpu
            replicas: 4

Wire it into the GPU Operator. On a fresh release, pass it at install (pin the chart):

helm upgrade --install gpu-operator nvidia/gpu-operator \
  -n gpu-operator --create-namespace \
  --version <pinned> \
  --set devicePlugin.config.name=time-slicing-config \
  --set devicePlugin.config.default=any

On a running cluster, create the ConfigMap then patch the ClusterPolicy (this is what the --set flags render to):

kubectl apply -f time-slicing-config.yaml
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
  -n gpu-operator --type merge \
  -p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config","default":"any"}}}}'

For the multi-profile ConfigMap, omit default and pick the key per node:

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
  -n gpu-operator --type merge \
  -p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config-fine"}}}}'
kubectl label node <gpu-node> nvidia.com/device-plugin.config=tesla-t4 --overwrite

Configuration¶

Field / flag	Where	Meaning
`sharing.timeSlicing.resources[].name`	ConfigMap data	Resource to oversubscribe — `nvidia.com/gpu` (or `nvidia.com/mig-<profile>` under MIG).
`sharing.timeSlicing.resources[].replicas`	ConfigMap data	Oversubscription factor. 1 physical GPU is advertised `replicas` times. No memory split.
`renameByDefault`	ConfigMap data	`true` advertises `nvidia.com/gpu.shared` (pods must request that name) so whole-GPU and shared workloads coexist. Default `false`.
`failRequestsGreaterThanOne`	ConfigMap data	`true` rejects pod requests for `>1` replica. Recommended `true`; default `false`.
`flags.migStrategy`	ConfigMap data	`none` for whole-GPU time-slicing; `mixed` to time-slice MIG devices.
`devicePlugin.config.name`	Helm / ClusterPolicy	Name of the ConfigMap the plugin reads.
`devicePlugin.config.default`	Helm / ClusterPolicy	Which ConfigMap key applies cluster-wide when a node has no override.
`nvidia.com/device-plugin.config`	Node label	Per-node override selecting a ConfigMap key; wins over `default`.
`nvidia.com/gpu.replicas`	Node label (set by plugin)	Read-only signal: the advertised replica factor.
`nvidia.com/gpu.product`	Node label (set by plugin)	Suffixed `-SHARED` when sharing is active (with `renameByDefault: false`).

Apply & verify¶

The Operator does not monitor the ConfigMap. After any create/edit, restart the device-plugin DaemonSet by hand:

kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset
kubectl rollout status  -n gpu-operator daemonset/nvidia-device-plugin-daemonset

Expected signal: allocatable shows the multiplied count (4 replicas on a 1-GPU node -> 4):

kubectl get node <gpu-node> \
  -o jsonpath='{.status.allocatable.nvidia\.com/gpu}{"\n"}'
# 4

Cluster-wide sweep and the plugin-set labels:

kubectl get nodes -l nvidia.com/gpu.present=true \
  -o custom-columns='NODE:.metadata.name,GPU_ALLOC:.status.allocatable.nvidia\.com/gpu'

kubectl get node <gpu-node> \
  -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.replicas}{"\n"}'   # 4
kubectl get node <gpu-node> \
  -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.product}{"\n"}'    # ...-SHARED

Functional check: schedule more pods than there are physical GPUs and confirm co-residency:

# two pods, one physical GPU, both Running
apiVersion: v1
kind: Pod
metadata: { name: ts-a, namespace: gpu-operator }
spec:
  restartPolicy: Never
  containers:
    - name: smi
      image: nvidia/cuda:13.0.0-base-ubuntu24.04
      command: ["sh", "-c", "nvidia-smi -L && sleep 60"]
      resources: { limits: { nvidia.com/gpu: 1 } }

kubectl apply -f ts-a.yaml
sed 's/ts-a/ts-b/' ts-a.yaml | kubectl apply -f -
kubectl get pods -n gpu-operator -l '!app' -o wide      # ts-a, ts-b both Running on the same node
kubectl logs ts-a -n gpu-operator                       # nvidia-smi -L prints the same physical UUID as ts-b

If renameByDefault: true, both pods must instead request nvidia.com/gpu.shared: 1, and allocatable for nvidia.com/gpu drops to 0 on shared nodes.

Failure modes¶

Edited the ConfigMap, nothing changed. The Operator does not watch it. Allocatable stays at 1 until you rollout restart the nvidia-device-plugin-daemonset. This is the single most common miss.
Allocatable still 1 after restart. devicePlugin.config.default is unset (or the node's nvidia.com/device-plugin.config label points at a missing key), so the plugin fell back to no sharing. Patch default or fix the node label.
Treating replicas as isolated GPUs. Pods OOM each other: one process allocates the whole frame buffer, the rest fail CUDA allocations. There is no per-replica memory cap; size workloads to share one GPU's memory, or move to MIG. See security & multi-tenancy: time-slicing is not a tenant boundary.
Noisy neighbour / latency spikes. Compute is round-robin time-sliced, not partitioned; a heavy kernel starves co-resident pods. Acceptable for dev and batched inference, not for latency-SLO serving.
renameByDefault: true but pods still request nvidia.com/gpu. They go Pending (that resource is now 0); switch requests to nvidia.com/gpu.shared.
Requested nvidia.com/gpu: 2 with failRequestsGreaterThanOne: true. Admission rejects it by design. Slices are not whole GPUs. Request 1 per pod.
MIG node with migStrategy: none. The plugin advertises nothing useful; under MIG use migStrategy: mixed and the nvidia.com/mig-<profile> resource name. Out of scope here; see MIG partitioning.

References¶

Time-Slicing GPUs in Kubernetes (GPU Operator): https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html
GPU Operator getting started / Helm values: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
NVIDIA k8s device plugin (config schema, sharing.timeSlicing): https://github.com/NVIDIA/k8s-device-plugin
Time-slicing on OpenShift (same ConfigMap, restart ceremony): https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/time-slicing-gpus-in-openshift.html
MIG vs time-slicing trade-offs (this KB): MIG