Markdown

Provisioning tooling¶

Scope: the bare-metal provisioning systems that turn racked GPU nodes into a fleet of identical, schedulable machines: Canonical MAAS, Warewulf, xCAT, OpenStack Ironic, and NVIDIA Base Command Manager (BCM) / Mission Control, covering what each is best at, its GPU-cluster fit, and how to pick.

All command and config snippets below are reference templates, not hardware-tested. Pin exact versions and package names against the cited vendor/project docs before running anything in production.

What it is¶

A bare-metal provisioner is the control plane that takes a powered, network-booting node from "BMC reachable" to "OS installed, baselined, and ready for the scheduler". It owns four jobs: drive power and boot via the BMC (OOB / BMC, IPMI, Redfish); serve the network boot chain (DHCP/TFTP/HTTP, PXE/iPXE; see bare-metal PXE); lay down an OS image consistently across the fleet (image management); and hand healthy nodes to Slurm or Kubernetes (provisioning & scheduling). This page compares the tools; the metal-level boot mechanics live on bare-metal PXE, and the scheduler deep-dives on Slurm / Kubernetes / k3s.

The five systems split into three philosophies:

Stateful, cloud-instance model. MAAS and Ironic install an OS to each node's local disk and treat the machine like a cloud server with a lifecycle (commission, deploy, release).
Stateless, image-from-RAM model. Warewulf (and xCAT's stateless mode) boot every node from one shared image that runs in memory; "installation" disappears.
Turnkey AI-cluster model. NVIDIA BCM packages provisioning, the workload manager, and GPU/health software as one validated stack for DGX/HGX estates.

Tool	Project	Model	Boot chain	Primary fit
MAAS	Canonical¹	Stateful, IPAM-centric	PXE/iPXE + DHCP/DNS	Mixed bare-metal estates, OpenStack/K8s substrate
Warewulf	warewulf.org⁵	Stateless, in-RAM	iPXE/GRUB + TFTP/HTTP	Classic HPC / Slurm compute pools
xCAT	xcat.org¹⁰	Stateful or stateless	PXE + DHCP/TFTP	Large legacy HPC, multi-arch (incl. IBM Power)
OpenStack Ironic	openstack.org¹¹	Stateful, API-driven	PXE/iPXE + IPA ramdisk	Bare-metal-as-a-service, OpenStack/Metal3
NVIDIA BCM	NVIDIA¹⁶	Turnkey image + scheduler	PXE (head node)	DGX/HGX, BasePOD/SuperPOD

Why it's needed (and when)¶

At one or two nodes you image by hand. Past a rack, hand-imaging is the source of the single worst failure class in a GPU cluster: image drift, nodes that disagree on driver branch, kernel, firmware, or CUDA, producing intermittent, non-reproducible collective failures (image management, runbook: image drift, reliability & RAS). A provisioner exists to make "every node is byte-identical and re-creatable from a definition" the default, so a suspect node is reprovisioned, not debugged.

You reach for one when you need: zero-touch enrolment of new hardware via the BMC; a single source of truth for the OS/driver baseline; reproducible re-imaging after an RMA or a drift incident (runbook: GPU fault / RMA); and a clean handoff into health gating and the scheduler (GPU health gating, Slurm topology placement). Which one depends on the surrounding stack; pick by what else is in the datacenter, not by feature checklist:

Already on OpenStack, or want a bare-metal API behind Kubernetes (Metal3/Cluster API) -> Ironic.
Classic Slurm HPC pool, want diskless nodes you never "install" -> Warewulf.
Heterogeneous estate (Ubuntu + RHEL + Windows), want IPAM + lifecycle + an API -> MAAS.
Large existing HPC site, multi-architecture, or IBM Power in the mix -> xCAT.
DGX/HGX estate, want NVIDIA to own provisioning + Slurm/K8s + GPU health as one supported product -> BCM / Mission Control.

flowchart LR
  RACK["Racked node, BMC reachable"] --> PROV["Provisioner drives PXE boot via BMC"]
  PROV --> IMG["OS image laid down (stateful) or booted in RAM (stateless)"]
  IMG --> BASE["Driver / firmware / CUDA baseline applied"]
  BASE --> GATE["Health gate: DCGM diagnostics"]
  GATE --> SCHED["Scheduler admits node: Slurm or Kubernetes"]

How it's set up & managed¶

The shape differs per tool. Below is the load-bearing command/config surface for each, grounded in the project docs; treat exact package names and versions as templates to pin.

Canonical MAAS¶

MAAS is "Canonical's private cloud infrastructure management system [that] enables physical servers to behave like cloud instances".¹ It is IPAM-centric: it runs highly-available DHCP and DNS, PXE-boots machines, and drives power through the BMC (IPMI, Redfish, iLO and others).²³ Machines move through an explicit lifecycle: New -> Commissioning -> Ready -> Allocated -> Deploying -> Deployed -> Releasing, plus Failed, Broken, and Rescue mode.⁴ Commissioning boots an ephemeral Ubuntu image and probes CPU, RAM, storage, and network; deploy powers the node on, PXE-boots it, and installs the chosen OS with user data via cloud-init.⁴ It exposes everything via web UI, a REST API, and CLI/Python bindings, with RBAC and LDAP/AD/SAML.¹

# reference template, not hardware-tested -- MAAS CLI lifecycle
# Add a machine by its BMC (power type + creds); MAAS enlists and commissions it
maas admin machines create \
  architecture=amd64/generic \
  power_type=redfish \
  power_parameters_power_address=https://10.0.0.21 \
  power_parameters_power_user=admin \
  power_parameters_power_pass=REDACTED \
  mac_addresses=aa:bb:cc:dd:ee:ff

maas admin machine commission <system_id>          # New -> Ready (hardware probe)
maas admin machine deploy <system_id> distro_series=noble   # -> Deploying -> Deployed
maas admin machine release <system_id>             # wipe + return to Ready

GPU-cluster fit: a strong neutral substrate when the estate is mixed-OS or feeds OpenStack/Kubernetes, and when IPAM and a hardware inventory matter. MAAS itself knows nothing about CUDA; the GPU driver/CUDA/firmware baseline is applied after deploy via cloud-init or a config-management role (driver install & lifecycle, Ansible bring-up).

Warewulf¶

Warewulf "is an operating system provisioning platform for Linux clusters" and "the most popular open source and vendor-agnostic provisioning system within the global HPC community" since its 2001 release.⁵ Its model is stateless: hundreds or thousands of nodes boot the same OS image, which "run[s] entirely in memory"; there is no per-node install to drift.⁵ Administration centralises on wwctl. The boot chain is in-firmware PXE -> DHCP with a next-server -> TFTP-loaded bootloader (iPXE or GRUB) -> kernel/image/overlays fetched over HTTP.⁶ The kernel ships inside the image.⁷ Per-node customisation is layered as overlays (static files + dynamic templates) applied at provision time and optionally re-applied at runtime.⁸

Node images are imported from OCI registries (docker://...), local OCI archives, chroot directories, or Apptainer sandboxes, so an image can be built with Docker/Podman/Apptainer in CI and pulled in directly.⁹

# reference template, not hardware-tested -- Warewulf 4 bring-up
wwctl image import docker://ghcr.io/warewulf/warewulf-rockylinux:8 rocky-8   # OCI -> node image
wwctl image build rocky-8                       # build the bootable image
wwctl node add gpu001 --ipaddr 10.0.1.1 --hwaddr aa:bb:cc:dd:ee:01 -I rocky-8
wwctl overlay build                             # render system + runtime overlays
wwctl configure --all                           # write dhcp, tftp, hosts, etc.

GPU-cluster fit: the natural choice for a classic Slurm compute pool where nodes are cattle and you want re-imaging to be a reboot. The NVIDIA stack (driver, Fabric Manager, CUDA) is baked into the image build, not installed per node (driver install & lifecycle, Fabric Manager); pair it with Slurm for scheduling.

xCAT¶

xCAT (Extreme Cloud Administration Toolkit) is an open-source provisioning and cluster-management toolkit long used at very large HPC sites; it supports both stateful (disk) and stateless/diskless deployment, broad OS coverage, and multiple architectures including x86_64 and IBM Power, driving nodes via PXE/DHCP/TFTP and the BMC.¹⁰ Operations are CLI-driven (nodeset, rpower, rcons, rsetboot, node groups). It overlaps Warewulf's niche; in practice you encounter xCAT where it is already the incumbent on a large or multi-arch estate. For new GPU clusters most green-field choices land on Warewulf, MAAS, Ironic, or BCM rather than introducing xCAT. Verify current release status and platform support against the project before adopting.¹⁰

GPU-cluster fit: legacy/inherited large HPC and multi-architecture sites. Treat as "support and operate" rather than a default green-field pick; the GPU baseline story is the same as Warewulf (bake it into the image).

OpenStack Ironic¶

Ironic "is a collection of components that provides support to manage and provision physical machines", OpenStack's bare-metal service, usable inside OpenStack or standalone via Bifrost (Ansible) or under Kubernetes via Metal3/Cluster API.¹¹¹³ Components: ironic-api (REST), ironic-conductor (the workhorse, distributing nodes across instances by a hash ring and polling power/sensor state), and the ironic-python-agent (IPA), a ramdisk that runs on the node to introspect hardware and write the image.¹¹¹² Power/management is driver-based: generic ipmi (via ipmitool) and redfish, plus vendor types ilo, irmc, idrac, and snmp, enabled in the conductor config.¹⁴¹⁵

# reference template, not hardware-tested -- /etc/ironic/ironic.conf
[DEFAULT]
enabled_hardware_types = ipmi,redfish,idrac,ilo,irmc
enabled_power_interfaces = ipmitool,redfish
enabled_management_interfaces = ipmitool,redfish

# reference template, not hardware-tested -- standalone Ironic (OpenStackClient)
baremetal node create --driver redfish \
  --driver-info redfish_address=https://10.0.0.21 \
  --driver-info redfish_system_id=/redfish/v1/Systems/1 \
  --driver-info redfish_username=admin --driver-info redfish_password=REDACTED
baremetal node manage <uuid>      # enroll -> manageable (validate)
baremetal node provide <uuid>     # -> available
baremetal node deploy <uuid>      # write image via IPA -> active

GPU-cluster fit: pick Ironic when bare metal must sit behind an API: an internal bare-metal-as-a-service, an OpenStack cloud, or Kubernetes-native cluster lifecycle (Metal3). It is heavier to stand up than Warewulf/MAAS; the payoff is programmatic, multi-tenant metal. GPU specifics are again post-deploy (driver install & lifecycle).

NVIDIA Base Command Manager (BCM) / Mission Control¶

BCM "streamlines cluster provisioning, workload management, and infrastructure monitoring [and] provides all the tools you need to deploy and manage an AI data center".¹⁶ Unlike the others, it is a turnkey stack: a head node provisions compute nodes from software images (each "a directory on the head node containing a full Linux filesystem"), groups nodes into categories that share configuration, and bundles the workload manager (Slurm or Kubernetes), monitoring, and NVIDIA's GPU/driver/health software.¹⁷¹⁸ It is administered through the cmsh CLI and the Base View GUI, and is the documented management plane for DGX BasePOD and SuperPOD.¹⁶¹⁹ Mission Control is the higher edition layered on BCM for the newest racks (GB200/GB300 NVL72), adding full-stack operations and autonomous recovery; on a Mission-Control-managed rack you configure through BCM rather than hand-installing packages.²⁰ This is the same management-plane surface described on DGX systems and the Blackwell platform.

# reference template, not hardware-tested -- BCM image management via cmsh
cmsh -c "softwareimage; list"                               # list images, kernels, node counts
cmsh -c "softwareimage; clone default-image gpu-image; commit"   # clone for controlled rollout
cm-chroot-sw-image /cm/images/gpu-image                     # chroot in to update the image
# then assign the image to a node category and push the update to nodes

GPU-cluster fit: the path of least resistance for a DGX/HGX estate that wants NVIDIA to own and support the whole provisioning + scheduling + GPU-health stack, with the validated driver/Fabric-Manager/DCGM baseline already integrated. The trade-off is a coupled, opinionated stack: e.g. BasePOD/SuperPOD on BCM 10.x stay on DGX OS 6 (Ubuntu 22.04) until BCM 11 / DGX OS 7 (Ubuntu 24.04) is supported, a version-coupling gotcha to plan upgrades around (DGX systems).²¹

Validated usage & tests¶

The acceptance question is identical across tools: did a freshly provisioned node come up byte-consistent with the fleet baseline, and is its GPU subsystem healthy before the scheduler admits it? Validate on the node after provisioning, then gate (GPU health gating, diagnostics tools). Output shapes are described below; the numbers are not invented.

# 1. Image/baseline consistency -- driver branch and kernel must match the fleet definition
nvidia-smi --query-gpu=driver_version --format=csv,noheader
uname -r
# Expect: the exact driver version and kernel pinned for this image. Any node that
# differs is drift -- reprovision, do not debug ([runbook: image drift](runbook-image-drift.md)).

# 2. All expected GPUs present (catches a half-provisioned or mis-imaged node)
nvidia-smi -L
# Expect: one "GPU <n>: <model> (UUID: ...)" line per installed GPU. A short count
# means the image/driver did not enumerate the hardware ([runbook: kernel/GPU missing](runbook-kernel-gpu-missing.md)).

# 3. Health gate before admit -- node-level acceptance, not just nvidia-smi
dcgmi diag -r 3
# Expect: no "Fail" lines. A node failing this on a known-good image is suspect
# hardware, not a provisioning fault -- keep it drained ([GPU health gating](gpu-health-gating.md)).

Per-tool state checks confirm the provisioner's own view agrees with reality: MAAS: maas admin machines read | jq '.[].status_name' should read Deployed for live nodes;⁴ Ironic: baremetal node list should show provisioning state active;¹¹ Warewulf: wwctl node list resolves the node to its assigned image;⁶ BCM: cmsh -c "device; list" shows nodes UP in their category.¹⁸ A node the provisioner reports healthy but that fails the GPU gate is the case the health gating page exists for: never let the scheduler schedule it.

Failure modes¶

Brief; each links its runbook.

Image drift across the fleet. Nodes disagree on driver branch, kernel, firmware, or CUDA, producing intermittent collective/NCCL failures that move around the cluster. The provisioner's job is to make this impossible; when it happens, reprovision from the canonical image rather than patching the outlier. -> runbook: image drift, image management.
BMC unreachable -> provisioning blind. PXE never starts because the provisioner cannot drive power/boot over IPMI/Redfish; the node is invisible to enrolment and re-imaging. -> runbook: OOB unreachable, OOB / BMC.
GPU baseline applied but unhealthy node admitted. Provisioning "succeeded" yet the GPU subsystem is degraded (ECC, NVLink, thermals); without a gate the scheduler places work on it. -> GPU health gating, reliability & RAS.
Coupled-stack version skew (BCM). A head-node/DGX-OS/BCM-edition mismatch blocks an upgrade or strands nodes on an old image. Plan around the BCM/DGX OS coupling. -> DGX systems.
Scheduler integration / topology. Provisioned nodes are admitted but placed topology-unaware, scattering tightly-coupled jobs. That is a scheduler concern, not a provisioning one. -> Slurm topology placement, runbook: topology scheduling.

References¶

Canonical MAAS — What MAAS is / can do (cloud-instance model, REST API/CLI, RBAC): https://canonical.com/maas/docs/what-maas-can-do
Canonical MAAS — Machine life-cycle (New/Commissioning/Ready/Allocated/Deploying/Deployed/Releasing; commission probes; deploy via cloud-init): https://canonical.com/maas/docs/about-the-machine-life-cycle
Canonical MAAS — Networking (integrated HA DHCP and DNS, PXE) and Power drivers (IPMI/Redfish/iLO): https://canonical.com/maas/docs
Warewulf User Guide — Introduction (OS provisioning platform, most popular HPC since 2001, stateless/in-RAM, wwctl): https://warewulf.org/docs/main/getting-started/introduction.html
Warewulf User Guide — Node Provisioning (PXE -> DHCP next-server -> iPXE/GRUB via TFTP -> kernel/image/overlays over HTTP): https://warewulf.org/docs/main/getting-started/provisioning.html
Warewulf User Guide — wwctl image import (docker://, OCI archive, chroot, Apptainer sandbox): https://warewulf.org/docs/main/reference/wwctl_image_import.html
xCAT project (open-source provisioning/management, stateful and stateless, multi-arch incl. Power): https://xcat.org/
OpenStack Ironic — Overview / Get started (collection of components; ironic-api, ironic-conductor, ironic-python-agent; Bifrost standalone): https://docs.openstack.org/ironic/latest/install/get_started.html
OpenStack Ironic — Drivers, Hardware Types, and Interfaces (ipmi, redfish, ilo, irmc, idrac, snmp): https://docs.openstack.org/ironic/latest/admin/drivers.html
OpenStack Ironic — Enabling drivers and hardware types (enabled_hardware_types): https://docs.openstack.org/ironic/latest/install/enabling-drivers.html
NVIDIA Base Command Manager docs (provisioning, workload management, monitoring; head node, cmsh, Base View): https://docs.nvidia.com/base-command-manager/index.html
NVIDIA — Managing Images in BCM (software image = directory with full Linux filesystem; softwareimage clone/list/commit, cm-chroot-sw-image): https://docs.nvidia.com/dgx/baseos-on-bcm-install-guide/managing-images-bcm.html
NVIDIA DGX BasePOD Deployment Guide — BCM head-node install (BCM as the BasePOD management plane): https://docs.nvidia.com/dgx-basepod/deployment-guide-dgx-basepod/latest/bcm-deploy.html
NVIDIA Mission Control docs (full-stack operations layer leveraging BCM on GB200/GB300 NVL72): https://docs.nvidia.com/mission-control/index.html
NVIDIA DGX SuperPOD update guide — DGX OS (BCM 10.x/DGX OS 6 vs BCM 11/DGX OS 7 coupling): https://docs.nvidia.com/dgx-superpod/update-guide/latest/dgx-os.html

Canonical, "What MAAS can do" — MAAS is Canonical's private-cloud infrastructure management system enabling physical servers to behave like cloud instances; discovers/commissions/deploys via the BMC; API-first with REST API, CLI and Python bindings; RBAC plus LDAP/AD/SAML; deploys Ubuntu/Windows/CentOS/RHEL/SUSE/ESXi across x86_64/ARM64/POWER/s390x. https://canonical.com/maas/docs/what-maas-can-do ↩↩↩
Canonical MAAS — provides integrated, highly-available open-source DHCP and DNS, and PXE/netboot for provisioning. https://canonical.com/maas/docs ↩
Canonical MAAS — machines require a BMC/power interface; supported interfaces include IPMI, Redfish, and iLO, used for remote power and out-of-band control. https://canonical.com/maas/docs/about-machine-basics ↩
Canonical MAAS — machine life-cycle states New, Commissioning, Ready, Allocated, Deploying, Deployed, Releasing (plus Failed, Broken, Rescue mode); commissioning boots an ephemeral Ubuntu image and probes CPU/RAM/storage/network; deploy PXE-boots and installs the OS with user data via cloud-init. https://canonical.com/maas/docs/about-the-machine-life-cycle ↩↩↩
Warewulf User Guide, Introduction — "Warewulf is an operating system provisioning platform for Linux clusters"; "the most popular open source and vendor-agnostic provisioning system within the global HPC community" since its 2001 release; stateless (disk-optional) provisioning where images "run entirely in memory"; centralised administration via the wwctl CLI. https://warewulf.org/docs/main/getting-started/introduction.html ↩↩↩
Warewulf User Guide, Node Provisioning — in-firmware PXE obtains DHCP with a next-server, TFTP-loads a bootloader (iPXE or GRUB), which fetches kernel, image and overlays over HTTP; an optional two-stage dracut initramfs precedes full image deployment. https://warewulf.org/docs/main/getting-started/provisioning.html ↩↩
Warewulf User Guide — OS images "provide a bootable image, including the kernel that will be used to boot the cluster node." https://warewulf.org/docs/main/getting-started/introduction.html ↩
Warewulf User Guide — overlays "customize the provisioned operating system image with static files and dynamic templates applied with the OS image and, optionally, periodically at runtime"; built into compressed images for distribution (system and runtime use cases). https://warewulf.org/docs/main/overlays/overlays.html ↩
Warewulf User Guide, wwctl image import — pulls/imports an image from an OCI registry (docker://registry/example:latest), docker-daemon://, a local OCI/tar archive, a chroot directory, or an Apptainer sandbox; options include --build, --force, --platform, --username/--password. https://warewulf.org/docs/main/reference/wwctl_image_import.html ↩
xCAT (Extreme Cloud Administration Toolkit) — open-source provisioning and cluster-management toolkit supporting stateful and stateless/diskless deployment across multiple architectures (including x86_64 and IBM Power), driving nodes via PXE/DHCP/TFTP and the BMC; CLI-driven (nodeset, rpower, rcons). Verify current release status and platform support before adoption. https://xcat.org/ ↩↩↩
OpenStack Ironic, Overview — "a collection of components that provides support to manage and provision physical machines"; components ironic-api (REST), ironic-conductor (drivers for power/provision/deploy/clean), and ironic-python-agent (IPA) ramdisk for hardware control and introspection; uses PXE/IPMI by default. https://docs.openstack.org/ironic/latest/install/get_started.html ↩↩↩↩
OpenStack Ironic — the ironic-conductor does the bulk of the work, manages a proportion of nodes distributed by a hash ring, and constantly polls power state and sensor data; multiple conductors run for scale and failover. https://docs.openstack.org/ironic/latest/install/get_started.html ↩
OpenStack Ironic — Bifrost is "a set of Ansible playbooks that automates the task of deploying a base image onto a set of known hardware using ironic in a standalone mode" (Ironic without the rest of OpenStack). https://docs.openstack.org/ironic/latest/install/get_started.html ↩
OpenStack Ironic, Drivers/Hardware Types — generic hardware types ipmi (via ipmitool) and redfish, plus vendor types ilo, irmc, idrac, and snmp. https://docs.openstack.org/ironic/latest/admin/drivers.html ↩
OpenStack Ironic, Enabling drivers — hardware types are enabled in the ironic-conductor config via enabled_hardware_types (e.g. ipmi,redfish,ilo,irmc) with matching enabled_power_interfaces / enabled_management_interfaces. https://docs.openstack.org/ironic/latest/install/enabling-drivers.html ↩
NVIDIA Base Command Manager — "streamlines cluster provisioning, workload management, and infrastructure monitoring [and] provides all the tools you need to deploy and manage an AI data center"; administered via the cmsh CLI and Base View; integrates workload managers and GPU software; management plane for DGX BasePOD/SuperPOD. https://docs.nvidia.com/base-command-manager/index.html ↩↩↩
NVIDIA, Managing Images in BCM — a software image is "a directory on the head node containing a full Linux filesystem"; managed with cmsh softwareimage (list, clone <src> <dst>; commit) and edited via cm-chroot-sw-image /cm/images/<name>. https://docs.nvidia.com/dgx/baseos-on-bcm-install-guide/managing-images-bcm.html ↩
NVIDIA Base Command Manager — a node category is a group of nodes that share the same configuration; software images are assigned to nodes or categories; kernel parameters resolve category-over-image, node-over-category. https://docs.nvidia.com/base-command-manager/index.html ↩↩
NVIDIA DGX BasePOD Deployment Guide — BCM head nodes are installed as the BasePOD management plane (BCM head-node installation step). https://docs.nvidia.com/dgx-basepod/deployment-guide-dgx-basepod/latest/bcm-deploy.html ↩
NVIDIA Mission Control — full-stack operations layer for GB200/GB300 NVL72 racks that "leverages NVIDIA Base Command Manager (BCM) for foundational cluster-management tasks such as provisioning compute nodes, configuring software images, assigning roles"; configured through BCM on managed racks. https://docs.nvidia.com/mission-control/index.html ↩
NVIDIA DGX SuperPOD update guide — BasePOD/SuperPOD estates on BCM 10.x stay on DGX OS 6 (Ubuntu 22.04) until BCM 11 / DGX OS 7 (Ubuntu 24.04) is supported for the upgrade. https://docs.nvidia.com/dgx-superpod/update-guide/latest/dgx-os.html ↩