Markdown

Manifest: Kueue ClusterQueue¶

Scope: the ResourceFlavor + ClusterQueue + LocalQueue triad that fences nvidia.com/gpu into team quota, plus a Job labelled to a LocalQueue and the kubectl get workloads checks that prove admission and quota accounting. The CRD detail behind line 5 of Kubernetes & Helm: GPU Platform; pairs with Kueue. Install the controller there.

Reference templates from the upstream Kueue v1beta2 CRDs. Pin the chart/image in Kueue and apply these manifests via GitOps (SRE and MLOps practices). Never hardware-tested here. nominalQuota numbers are placeholders; set them to your fleet's real GPU count.

flowchart LR
  JOB["Job + label<br/>kueue.x-k8s.io/queue-name"] --> WL["Workload<br/>(unit of admission)"]
  WL --> LQ["LocalQueue<br/>(namespace-scoped)"]
  LQ --> CQ["ClusterQueue<br/>(quota + namespaceSelector)"]
  CQ --> RF["ResourceFlavor<br/>(nodeLabels -> GPU nodes)"]
  CQ -. "QuotaReserved -> Admitted" .-> WL

What it is¶

Kueue is a job-level quota and queueing controller: it suspends Jobs, then admits them only when their ResourceFlavor quota is free. Three cluster-scoped/namespaced objects:

ResourceFlavor names a class of nodes (here, GPU nodes) via nodeLabels/nodeTaints. Quota is counted per flavor.¹
ClusterQueue is the quota pool: which resources it covers (coveredResources), how much per flavor (nominalQuota), and which namespaces may draw on it (namespaceSelector).²
LocalQueue is the namespaced handle teams submit to; it points at one ClusterQueue via spec.clusterQueue.⁴

A Job carries kueue.x-k8s.io/queue-name: <localqueue>; Kueue creates a Workload for it (the unit of admission) and walks it through QuotaReserved -> Admitted -> Finished.⁵⁶ Kueue is not a scheduler; it gates admission. Gang placement is still Volcano/kube-scheduler (Volcano Job).

Prerequisites¶

Kueue controller installed and Ready; see Kueue. CRDs clusterqueues, localqueues, resourceflavors, workloads in group kueue.x-k8s.io/v1beta2 must exist.
GPU nodes already advertising nvidia.com/gpu (GPU Operator device plugin up; see GPU Operator ClusterPolicy).
A node label to bind the flavor to GPU nodes. The GPU Operator's GFD applies nvidia.com/gpu.present=true; pick a label that is true on every GPU node (Containers and Kubernetes for GPUs).
A namespace per team (here team-a) so namespaceSelector can scope the pool.

kubectl get crd | grep kueue.x-k8s.io          # expect clusterqueues/localqueues/resourceflavors/workloads
kubectl -n kueue-system get deploy kueue-controller-manager   # expect READY 1/1

The manifest¶

One flavor pinned to GPU nodes, one ClusterQueue scoped to labelled namespaces, one LocalQueue in team-a. Apply in this order (flavor and ClusterQueue are cluster-scoped; the LocalQueue is namespaced).

# 1. ResourceFlavor: this quota counts only GPU nodes.
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: gpu-nodes
spec:
  nodeLabels:
    nvidia.com/gpu.present: "true"   # set by GPU Operator GFD; use a label true on every GPU node
---
# 2. ClusterQueue: the GPU quota pool, open only to namespaces labelled kueue.x-k8s.io/queue=research.
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: research
spec:
  namespaceSelector:
    matchLabels:
      kueue.x-k8s.io/queue: research   # only matching namespaces may borrow from this pool
  queueingStrategy: BestEffortFIFO     # default; head-of-line blocks only in StrictFIFO
  resourceGroups:
  - coveredResources: ["nvidia.com/gpu"]
    flavors:
    - name: gpu-nodes
      resources:
      - name: nvidia.com/gpu
        nominalQuota: 64               # PLACEHOLDER: total GPUs this queue may admit at once
---
# 3. LocalQueue: the handle team-a submits to.
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  name: team-a-gpu
  namespace: team-a
spec:
  clusterQueue: research

Label the namespace so the namespaceSelector matches (skip if you used namespaceSelector: {} for all namespaces):

kubectl create namespace team-a --dry-run=client -o yaml | kubectl apply -f -
kubectl label namespace team-a kueue.x-k8s.io/queue=research --overwrite
kubectl apply -f kueue-quota.yaml

A Job that requests 2 GPUs and is routed to the LocalQueue by label. Do not pre-suspend it; Kueue suspends and resumes via its webhook.⁶

apiVersion: batch/v1
kind: Job
metadata:
  generateName: gpu-burn-
  namespace: team-a
  labels:
    kueue.x-k8s.io/queue-name: team-a-gpu   # routes the Workload to LocalQueue team-a-gpu
spec:
  parallelism: 1
  completions: 1
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: cuda
        image: nvidia/cuda:13.0.0-base-ubuntu24.04   # pin to your fleet's CUDA base
        command: ["nvidia-smi"]
        resources:
          limits:
            nvidia.com/gpu: 2          # counted against nominalQuota: 64

kubectl create -f gpu-job.yaml   # generateName needs create, not apply

Configuration¶

Object.field	Type	Meaning / values
`ResourceFlavor.spec.nodeLabels`	map	Node labels the flavor binds to; injected into the pod at admission so it lands on those nodes.¹
`ResourceFlavor.spec.nodeTaints`	list	Taints the flavor's quota requires the pod to tolerate; only tolerating workloads consume it.¹
`ResourceFlavor.spec.tolerations`	list	Tolerations Kueue adds to admitted pods so they schedule onto tainted GPU nodes.¹
`ClusterQueue.spec.namespaceSelector`	LabelSelector	Which namespaces may draw on the pool. `{}` = all namespaces; `matchLabels` to scope.²
`ClusterQueue.spec.resourceGroups[].coveredResources`	list	Resource names this group governs, e.g. `["nvidia.com/gpu"]`.²
`…resourceGroups[].flavors[].name`	string	Must reference an existing `ResourceFlavor` (here `gpu-nodes`).²
`…flavors[].resources[].nominalQuota`	quantity	Admittable amount of that resource in that flavor. GPUs are integers.²
`…resources[].borrowingLimit`	quantity	Max this CQ may borrow from its cohort above nominal; omit to disallow borrowing.²
`…resources[].lendingLimit`	quantity	Max this CQ lends to the cohort; omit to lend all idle quota.²
`ClusterQueue.spec.cohortName`	string	Cohort this CQ shares/borrows with (renamed from `cohort` in v1beta2).³
`ClusterQueue.spec.queueingStrategy`	enum	`BestEffortFIFO` (default) or `StrictFIFO` (head-of-line blocks).²
`ClusterQueue.spec.preemption.reclaimWithinCohort`	enum	`Never` \| `LowerPriority` \| `Any` — reclaim borrowed quota from cohort peers.²
`ClusterQueue.spec.preemption.withinClusterQueue`	enum	`Never` \| `LowerPriority` \| `LowerOrNewerEqualPriority`.²
`ClusterQueue.spec.stopPolicy`	enum	`None` \| `Hold` \| `HoldAndDrain` — pause admission / drain the queue.²
`LocalQueue.spec.clusterQueue`	string	The `ClusterQueue` this LocalQueue feeds.⁴
Job label `kueue.x-k8s.io/queue-name`	string	Routes the Job's Workload to a `LocalQueue` in the same namespace.⁷⁶

Apply & verify¶

kubectl apply -f kueue-quota.yaml        # flavor + ClusterQueue + LocalQueue
kubectl get clusterqueue research -o wide
kubectl get localqueue team-a-gpu -n team-a

The ClusterQueue is usable only once it reports Active:

kubectl get clusterqueue research -o jsonpath='{range .status.conditions[?(@.type=="Active")]}{.status}{" "}{.reason}{"\n"}{end}'
# expected: True Ready

A False/Active here almost always means the referenced ResourceFlavor does not exist. Check kubectl get resourceflavor gpu-nodes.

Submit the Job and watch the Workload move through admission:

kubectl create -f gpu-job.yaml
kubectl -n team-a get workloads.kueue.x-k8s.io

Expected once quota is free (ADMITTED=True):

NAME                       QUEUE        RESERVED IN   ADMITTED   AGE
job-gpu-burn-xxxxx-abcde   team-a-gpu   research      True       3s

RESERVED IN shows the ClusterQueue holding the quota; ADMITTED=True means the pods were resumed.⁶ Confirm the full condition chain and quota accounting:

WL=$(kubectl -n team-a get workloads.kueue.x-k8s.io -o name | head -n1)
kubectl -n team-a get "$WL" -o jsonpath='{range .status.conditions[*]}{.type}={.status}{"\n"}{end}'
# expected: QuotaReserved=True  Admitted=True   (Finished=True after the Job completes)

kubectl get clusterqueue research -o jsonpath='{.status.flavorsUsage}' | jq .
# expected: gpu-nodes / nvidia.com/gpu total == 2  (matches the Job's limit)
kubectl get clusterqueue research -o jsonpath='{.status.admittedWorkloads}{"\n"}'   # expected: 1

Quota exhaustion is the correct negative signal: with nominalQuota: 64, the 33rd 2-GPU Job stays ADMITTED empty and kubectl -n team-a describe workload <wl> shows couldn't assign flavors … insufficient quota for nvidia.com/gpu.⁶ The Job's pods do not exist until admission; that is Kueue working, not a stuck Job.

Failure modes¶

ClusterQueue Active=False. flavors[].name points at a ResourceFlavor that does not exist (typo, or applied out of order). Create the flavor first; the CQ reconciles to Active on its own.³
Workload stuck, no QuotaReserved. The Job's namespace does not match namespaceSelector. Label it (kueue.x-k8s.io/queue=research) or widen the selector. Empty Workload list usually means the kueue.x-k8s.io/queue-name label is missing/misspelled, so Kueue never adopted the Job.⁶
QuotaReserved=True but pods never schedule. The ResourceFlavor.nodeLabels select nodes that lack free GPUs, or pods don't tolerate the flavor's nodeTaints. Add tolerations to the flavor or fix the label. Verify GPUs are actually free with kubectl describe node (GPU Diagnostics and Validation).
Quota never frees after Jobs finish. Workloads not transitioning to Finished; check the controller logs in kueue-system. Until Finished, their GPUs stay counted in flavorsUsage.⁵
Pre-suspended Job hangs. Manually setting spec.suspend: true and expecting Kueue to also manage it; let the webhook own suspension. Conversely a Job with no queue label runs ungated and bypasses quota entirely.⁶
GPUs requested under requests only. For nvidia.com/gpu the device plugin requires limits; quota is counted from the effective request. Always set limits (Containers and Kubernetes for GPUs).

References¶

Kueue v1beta2 API reference (ResourceFlavor, ClusterQueue, LocalQueue, Workload specs; cohortName): https://kueue.sigs.k8s.io/docs/reference/kueue.v1beta2/
ClusterQueue concept (resourceGroups, namespaceSelector, queueingStrategy, preemption, stopPolicy): https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/
ResourceFlavor concept (nodeLabels, nodeTaints, tolerations): https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/
LocalQueue concept (spec.clusterQueue, status): https://kueue.sigs.k8s.io/docs/concepts/local_queue/
Workload concept (unit of admission, conditions): https://kueue.sigs.k8s.io/docs/concepts/workload/
Run a Kubernetes Job (queue-name label, suspend behaviour, kubectl get workloads output): https://kueue.sigs.k8s.io/docs/tasks/run/jobs/
Labels and annotations (kueue.x-k8s.io/queue-name): https://kueue.sigs.k8s.io/docs/reference/labels-and-annotations/
Administer cluster quotas: https://kueue.sigs.k8s.io/docs/tasks/manage/administer_cluster_quotas/

ResourceFlavor spec.nodeLabels/nodeTaints/tolerations — https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/ ↩↩↩↩
ClusterQueue resourceGroups, coveredResources, flavors[].resources[].nominalQuota/borrowingLimit/lendingLimit, namespaceSelector, queueingStrategy, preemption, stopPolicy — https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/ ↩↩↩↩↩↩↩↩↩↩↩
ClusterQueueSpec cohortName (type CohortReference) in v1beta2 — https://kueue.sigs.k8s.io/docs/reference/kueue.v1beta2/ ↩↩
LocalQueue spec.clusterQueue and status counters — https://kueue.sigs.k8s.io/docs/concepts/local_queue/ ↩↩
Workload is the unit of admission; conditions QuotaReserved/Admitted/Finished — https://kueue.sigs.k8s.io/docs/concepts/workload/ ↩↩
kueue.x-k8s.io/queue-name label, webhook-managed suspension, kubectl get workloads columns, insufficient-quota describe output — https://kueue.sigs.k8s.io/docs/tasks/run/jobs/ ↩↩↩↩↩↩↩
kueue.x-k8s.io/queue-name label reference — https://kueue.sigs.k8s.io/docs/reference/labels-and-annotations/ ↩