Manifest: GPU time-slicing¶
Scope: the NVIDIA k8s device-plugin ConfigMap (replicas under sharing.timeSlicing.resources), wiring it through the GPU Operator via devicePlugin.config, the manual restart ceremony the Operator does not do for you, and the noisy-neighbour / no-memory-isolation caveat versus MIG. Verify node allocatable shows the multiplied nvidia.com/gpu count.
Reference template from upstream NVIDIA GPU Operator and k8s-device-plugin docs. Not hardware-tested here. Pin the GPU Operator chart and apply via GitOps; choose this sharing model deliberately. It is oversubscription, not isolation. Builds on Kubernetes & Helm: GPU Platform §3.
flowchart LR
CM["ConfigMap<br/>sharing.timeSlicing.replicas=N"] --> WIRE["devicePlugin.config.name/.default"]
WIRE --> RESTART["rollout restart device-plugin DS"]
RESTART --> ADV["node advertises N x nvidia.com/gpu"]
ADV --> WARN["shared context: no memory isolation"]
What it is¶
Time-slicing advertises a single physical GPU as replicas logical units of nvidia.com/gpu, letting that many pods land on one GPU and share it by the GPU's time-sliced context switch. There is no memory partition and no fault isolation: every replica shares one frame buffer, so one pod's allocation can OOM the others, and a hung kernel stalls the lot. Replicas oversubscribe the resource; they do not grant proportional compute. This is the cheap, dev/bursty/inference-batching sharing model. When you need a hardware memory and engine partition, use MIG instead; for software-managed concurrent contexts with per-client memory limits, MPS sits between the two.
The mechanism lives entirely in the NVIDIA k8s device plugin; the GPU Operator just delivers the plugin's config from a ConfigMap. The Operator does not watch that ConfigMap. Editing it in place is silent until you restart the plugin.
Prerequisites¶
- A working GPU Operator install with
devicePlugin.enabled=trueand nodes already advertisingnvidia.com/gpu(the smoke pod in the hub passes). mig.strategyleft atnone/singleon these nodes. Time-slicing on top of MIG devices is possible (flags.migStrategy: mixed, resource namenvidia.com/mig-<profile>) but out of scope here.- Decide
failRequestsGreaterThanOneup front. NVIDIA recommendstrueso a request for>1replica fails fast (the slices are not interchangeable with whole GPUs); it defaultsfalsefor backward compatibility. kubectlaccess to thegpu-operatornamespace and the node labels.
The manifest¶
The ConfigMap key is an arbitrary profile name (any is the conventional cluster-wide default). version: v1 and the sharing.timeSlicing block are the device-plugin config schema.
# time-slicing-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config
namespace: gpu-operator
data:
any: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false # true -> advertise nvidia.com/gpu.shared instead
failRequestsGreaterThanOne: true # requests for >1 replica are rejected (recommended)
resources:
- name: nvidia.com/gpu
replicas: 4 # 1 physical GPU -> 4 schedulable units, SHARED memory
Multi-profile ConfigMap (different replica counts per node class), selected per node by label:
# time-slicing-config-fine.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config-fine
namespace: gpu-operator
data:
a100-40gb: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: true
resources:
- name: nvidia.com/gpu
replicas: 8
tesla-t4: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: true
resources:
- name: nvidia.com/gpu
replicas: 4
Wire it into the GPU Operator. On a fresh release, pass it at install (pin the chart):
helm upgrade --install gpu-operator nvidia/gpu-operator \
-n gpu-operator --create-namespace \
--version <pinned> \
--set devicePlugin.config.name=time-slicing-config \
--set devicePlugin.config.default=any
On a running cluster, create the ConfigMap then patch the ClusterPolicy (this is what the --set flags render to):
kubectl apply -f time-slicing-config.yaml
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config","default":"any"}}}}'
For the multi-profile ConfigMap, omit default and pick the key per node:
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec":{"devicePlugin":{"config":{"name":"time-slicing-config-fine"}}}}'
kubectl label node <gpu-node> nvidia.com/device-plugin.config=tesla-t4 --overwrite
Configuration¶
| Field / flag | Where | Meaning |
|---|---|---|
sharing.timeSlicing.resources[].name |
ConfigMap data | Resource to oversubscribe — nvidia.com/gpu (or nvidia.com/mig-<profile> under MIG). |
sharing.timeSlicing.resources[].replicas |
ConfigMap data | Oversubscription factor. 1 physical GPU is advertised replicas times. No memory split. |
renameByDefault |
ConfigMap data | true advertises nvidia.com/gpu.shared (pods must request that name) so whole-GPU and shared workloads coexist. Default false. |
failRequestsGreaterThanOne |
ConfigMap data | true rejects pod requests for >1 replica. Recommended true; default false. |
flags.migStrategy |
ConfigMap data | none for whole-GPU time-slicing; mixed to time-slice MIG devices. |
devicePlugin.config.name |
Helm / ClusterPolicy | Name of the ConfigMap the plugin reads. |
devicePlugin.config.default |
Helm / ClusterPolicy | Which ConfigMap key applies cluster-wide when a node has no override. |
nvidia.com/device-plugin.config |
Node label | Per-node override selecting a ConfigMap key; wins over default. |
nvidia.com/gpu.replicas |
Node label (set by plugin) | Read-only signal: the advertised replica factor. |
nvidia.com/gpu.product |
Node label (set by plugin) | Suffixed -SHARED when sharing is active (with renameByDefault: false). |
Apply & verify¶
The Operator does not monitor the ConfigMap. After any create/edit, restart the device-plugin DaemonSet by hand:
kubectl rollout restart -n gpu-operator daemonset/nvidia-device-plugin-daemonset
kubectl rollout status -n gpu-operator daemonset/nvidia-device-plugin-daemonset
Expected signal: allocatable shows the multiplied count (4 replicas on a 1-GPU node -> 4):
Cluster-wide sweep and the plugin-set labels:
kubectl get nodes -l nvidia.com/gpu.present=true \
-o custom-columns='NODE:.metadata.name,GPU_ALLOC:.status.allocatable.nvidia\.com/gpu'
kubectl get node <gpu-node> \
-o jsonpath='{.metadata.labels.nvidia\.com/gpu\.replicas}{"\n"}' # 4
kubectl get node <gpu-node> \
-o jsonpath='{.metadata.labels.nvidia\.com/gpu\.product}{"\n"}' # ...-SHARED
Functional check: schedule more pods than there are physical GPUs and confirm co-residency:
# two pods, one physical GPU, both Running
apiVersion: v1
kind: Pod
metadata: { name: ts-a, namespace: gpu-operator }
spec:
restartPolicy: Never
containers:
- name: smi
image: nvidia/cuda:13.0.0-base-ubuntu24.04
command: ["sh", "-c", "nvidia-smi -L && sleep 60"]
resources: { limits: { nvidia.com/gpu: 1 } }
kubectl apply -f ts-a.yaml
sed 's/ts-a/ts-b/' ts-a.yaml | kubectl apply -f -
kubectl get pods -n gpu-operator -l '!app' -o wide # ts-a, ts-b both Running on the same node
kubectl logs ts-a -n gpu-operator # nvidia-smi -L prints the same physical UUID as ts-b
If renameByDefault: true, both pods must instead request nvidia.com/gpu.shared: 1, and allocatable for nvidia.com/gpu drops to 0 on shared nodes.
Failure modes¶
- Edited the ConfigMap, nothing changed. The Operator does not watch it. Allocatable stays at
1until yourollout restartthenvidia-device-plugin-daemonset. This is the single most common miss. - Allocatable still
1after restart.devicePlugin.config.defaultis unset (or the node'snvidia.com/device-plugin.configlabel points at a missing key), so the plugin fell back to no sharing. Patchdefaultor fix the node label. - Treating replicas as isolated GPUs. Pods OOM each other: one process allocates the whole frame buffer, the rest fail CUDA allocations. There is no per-replica memory cap; size workloads to share one GPU's memory, or move to MIG. See security & multi-tenancy: time-slicing is not a tenant boundary.
- Noisy neighbour / latency spikes. Compute is round-robin time-sliced, not partitioned; a heavy kernel starves co-resident pods. Acceptable for dev and batched inference, not for latency-SLO serving.
renameByDefault: truebut pods still requestnvidia.com/gpu. They goPending(that resource is now0); switch requests tonvidia.com/gpu.shared.- Requested
nvidia.com/gpu: 2withfailRequestsGreaterThanOne: true. Admission rejects it by design. Slices are not whole GPUs. Request1per pod. - MIG node with
migStrategy: none. The plugin advertises nothing useful; under MIG usemigStrategy: mixedand thenvidia.com/mig-<profile>resource name. Out of scope here; see MIG partitioning.
References¶
- Time-Slicing GPUs in Kubernetes (GPU Operator): https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html
- GPU Operator getting started / Helm values: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
- NVIDIA k8s device plugin (config schema,
sharing.timeSlicing): https://github.com/NVIDIA/k8s-device-plugin - Time-slicing on OpenShift (same ConfigMap, restart ceremony): https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/time-slicing-gpus-in-openshift.html
- MIG vs time-slicing trade-offs (this KB): MIG
Related: GPU Platform hub · Kubernetes for GPUs · MIG partitioning · Dynamic & fractional sharing · Security & multi-tenancy · Telemetry · Glossary