Ansible role: mig¶
Scope: enable MIG mode and lay out one requested profile per GPU via nvidia-smi mig, wrapped so the role is idempotent. It reads current state (nvidia-smi --query-gpu=mig.mode.current, nvidia-smi -L) before mutating, gates on node readiness, and converges to mig_profile without re-cutting an already-correct geometry. This is the optional mig role in the bring-up site.yml, run after nvidia_stack. Partition mechanics, profile tables, and the nvidia-smi mig lifecycle live in MIG.
Reference template, not hardware-tested. Validate every command and profile name against the MIG User Guide and your driver / GPU SKU before a fleet roll. Run on one node first.
flowchart LR
GATE["Gate: mig_enabled and datacenter tier and GPUs visible"] --> MODE["nvidia-smi --query-gpu=mig.mode.current"]
MODE -->|"Disabled"| ENABLE["nvidia-smi -i ID -mig 1"]
ENABLE -->|"pending=Enabled, Ampere"| REBOOT["notify reboot node"]
MODE -->|"Enabled"| HAVE["nvidia-smi -L"]
ENABLE --> HAVE
HAVE -->|"profile missing on GPU ID"| CREATE["nvidia-smi mig -i ID -cgi mig_profile -C"]
HAVE -->|"profile present"| OK["no change"]
CREATE --> VERIFY["nvidia-smi -L lists MIG-UUID devices"]
What it does¶
Takes a node whose driver stack is already up (the nvidia_stack role ran, nvidia-smi enumerates GPUs) and brings the requested MIG geometry into existence, idempotently:
- Gate. Run only when
mig_enabled | bool, the node isgpu_tier == 'datacenter'(MIG is a datacenter / RTX PRO capability, not consumer GeForce), and GPUs are actually visible. If the gate fails the role is a no-op, leaving no partial state. - Read mode. Query
mig.mode.currentper GPU. Enable MIG mode (nvidia-smi -i <id> -mig 1) only where it readsDisabled, so a re-run on an already-enabled node changes nothing. - Handle the reset. On Ampere enabling MIG triggers a GPU reset and the mode is persistent across reboots (InfoROM status bit); if the reset cannot complete in-band the change lands in
mig.mode.pending=Enabledand a reboot applies it. On Hopper and newer no reset is needed, but the mode is not reboot-persistent, so the role must re-enable on every boot (re-runsite.ymlafter reboot, or front it with the stale-MIG runbook). - Read geometry. Parse
nvidia-smi -i <id> -Lper target GPU. Create instances (nvidia-smi mig -i <id> -cgi <mig_profile> -C) only where the requested profile is not already present, so the destructive create path is skipped on a converged GPU. - Verify. Confirm
nvidia-smi -LenumeratesMIG-<UUID>devices for the requested profile.
The role deliberately does not tear down and recreate on every run; reshaping a live layout is a drain-gated operation owned by the stale-MIG runbook, not by routine convergence. It also does not manage the GPU Operator's MIG manager: when the Operator owns geometry (nvidia.com/mig.config), leave mig_enabled=false and let the Operator drive it.
Variables¶
Role and inventory variables (set in inventory/hosts.ini [gpu_nodes:vars] or host_vars/). mig_enabled and gpu_tier are shared with the hub inventory; mig_profile is introduced by this role.
| Variable | Default | Scope | Meaning |
|---|---|---|---|
mig_enabled |
false |
inventory (hub) | Master gate. Role is a no-op unless true. Only datacenter (A30, A100, H100/H200, B-series) or RTX PRO Blackwell SKUs. |
gpu_tier |
datacenter |
inventory (hub) | datacenter \| workstation \| consumer. MIG tasks run only on datacenter. |
mig_profile |
1g.10gb |
role | Profile applied to each GPU on the node, by name or numeric profile ID from -lgip. Comma-separate to cut several instances per card, e.g. 3g.20gb,3g.20gb or 1g.10gb,1g.10gb,1g.10gb,1g.10gb,1g.10gb,1g.10gb,1g.10gb. Must fit the card's slice budget; must be valid for the SKU (see profile tables in MIG). |
mig_gpu_ids |
all |
role | Which GPUs to target. all omits -i for the mode-enable step and derives IDs from nvidia-smi --query-gpu=index; a comma list such as 0,1 is passed as nvidia-smi -i 0,1 -mig 1 for mode enable and looped as nvidia-smi mig -i <id> -cgi ... for instance creation. -i all is not valid syntax, so the token is dropped when the value is all. |
mig_create_compute_instances |
true |
role | Pass -C so each GI also gets its compute instance (CUDA sees the slice). false cuts GIs only — almost never what you want; CUDA enumerates nothing without a CI. |
mig_reboot_on_reset |
true |
role | Ampere only: if enabling MIG cannot reset in-band, allow the role to notify a reboot handler to apply mig.mode.pending. Set false to fail loudly instead of rebooting. |
Profile defaults are intentionally conservative (1g.10gb). Override per node group; do not assume one profile fits A100 and H200 alike (memory-slice sizes differ: 1g.10gb vs 1g.18gb). The full per-GPU profile tables are in MIG.
Tasks¶
roles/mig/tasks/main.yml. Uses only ansible.builtin.command (no shell metacharacters needed) with task-level register / changed_when / failed_when for idempotency, plus ansible.builtin.assert for the readiness gate. The whole file is guarded by a block-level when, so nothing runs off-tier.
# roles/mig/tasks/main.yml
- name: MIG configuration
when: mig_enabled | bool and gpu_tier == 'datacenter'
block:
- name: Readiness gate - GPUs are visible to the driver
ansible.builtin.command: nvidia-smi -L
register: mig_smi_list
changed_when: false
failed_when: mig_smi_list.rc != 0
- name: Readiness gate - assert at least one GPU enumerated
ansible.builtin.assert:
that: "'GPU 0' in mig_smi_list.stdout"
fail_msg: "No GPU enumerated by nvidia-smi -L; nvidia_stack role must complete before mig."
- name: Read current MIG mode per GPU
ansible.builtin.command: >-
nvidia-smi --query-gpu=index,mig.mode.current --format=csv,noheader
register: mig_mode
changed_when: false
failed_when: mig_mode.rc != 0
# stdout rows: "0, Disabled" | "0, Enabled". [Disabled] also appears on
# non-MIG SKUs; the block 'when' already excludes those by gpu_tier.
- name: Enable MIG mode where currently Disabled
ansible.builtin.command: >-
nvidia-smi {{ ('-i ' ~ mig_gpu_ids) if mig_gpu_ids != 'all' else '' }} -mig 1
register: mig_enable
when: "'Disabled' in mig_mode.stdout"
changed_when: "'Enabled MIG Mode' in mig_enable.stdout"
failed_when: >-
mig_enable.rc != 0
and 'In use by another client' not in (mig_enable.stderr | default(''))
# On Ampere this resets the GPU; the reset is refused while a CUDA app or
# a stray nvidia-smi holds the device ("In use by another client").
# Drain/clear clients first (see runbook-mig-state-stale).
- name: Re-read MIG mode after enable (detect pending reset, Ampere)
ansible.builtin.command: >-
nvidia-smi --query-gpu=index,mig.mode.current,mig.mode.pending --format=csv,noheader
register: mig_mode_post
changed_when: false
failed_when: mig_mode_post.rc != 0
when: mig_enable is changed
- name: Reboot to apply pending MIG mode (Ampere, reset not yet in effect)
ansible.builtin.command: "true"
changed_when: true
notify: reboot node
when:
- mig_reboot_on_reset | bool
- mig_mode_post is defined
- mig_mode_post.stdout is defined
- "'Enabled' in mig_mode_post.stdout"
- "'Disabled, Enabled' in mig_mode_post.stdout" # current=Disabled, pending=Enabled
# Hopper+ needs no reset (current flips immediately); this fires only when
# current is still Disabled but pending is Enabled. Flush handlers before
# creating instances so the node is back up first.
- name: Apply pending reboot now (before creating instances)
ansible.builtin.meta: flush_handlers
- name: Resolve target GPU IDs for instance creation
ansible.builtin.set_fact:
mig_target_gpu_ids: >-
{{ (mig_mode.stdout_lines
| map('regex_replace', '^\\s*([^,]+),.*$', '\\1')
| map('trim')
| list)
if mig_gpu_ids == 'all'
else (mig_gpu_ids.split(',') | map('trim') | list) }}
- name: List existing MIG devices per target GPU
ansible.builtin.command: nvidia-smi -i {{ item }} -L
register: mig_existing
changed_when: false
failed_when: mig_existing.rc != 0
loop: "{{ mig_target_gpu_ids }}"
- name: Create GPU instances for the requested profile (with compute instances)
ansible.builtin.command: >-
nvidia-smi mig -i {{ item.item }} -cgi {{ mig_profile }}
{{ '-C' if mig_create_compute_instances | bool else '' }}
register: mig_create
# Idempotency: only cut instances on a GPU where the requested profile is
# not already present in that GPU's `nvidia-smi -i <id> -L` output.
loop: "{{ mig_existing.results }}"
when: ('MIG ' ~ (mig_profile.split(',')[0]) ~ ' Device') not in item.stdout
changed_when: "'Successfully created' in mig_create.stdout"
failed_when: >-
mig_create.rc != 0
and 'Insufficient resources' not in (mig_create.stderr | default(''))
- name: Verify MIG devices are enumerated per target GPU
ansible.builtin.command: nvidia-smi -i {{ item }} -L
register: mig_verify
loop: "{{ mig_target_gpu_ids }}"
changed_when: false
failed_when: >-
('MIG ' ~ (mig_profile.split(',')[0]) ~ ' Device') not in mig_verify.stdout
# roles/mig/handlers/main.yml
- name: reboot node
ansible.builtin.reboot:
reboot_timeout: 1200
post_reboot_delay: 30
Notes on the idempotency contract:
- Read-before-write everywhere.
mig.mode.currentgates the enable;nvidia-smi -Lgates the create. No step mutates without first proving the target state is absent. changed_whenis keyed on success strings (Enabled MIG Mode,Successfully created), not on exit code, so a skipped-because-present run reportsok, notchanged.- The create loop targets each GPU ID explicitly with
nvidia-smi mig -i <id> ...; a multi-GPU node is not left with only GPU 0 cut. The presence check still matches only the first profile token inmig_profile. For a heterogeneous layout (e.g.3g.20gb,1g.10gb) the presence check is partial; prefer the explicit teardown/recreate path in stale-MIG runbook, which is drain-gated, over re-running this role to reshape. flush_handlersforces the reboot (if any) to complete before the create step, so instances are cut on a GPU whose MIG mode is actually in effect.
Apply & verify¶
Run the hub playbook scoped to one node, or the role directly:
# whole bring-up, one node, MIG on, explicit profile:
ansible-playbook -i inventory/hosts.ini site.yml --limit gpu-07.dc1.internal \
-e mig_enabled=true -e mig_profile=1g.10gb
# dry run first (will show the enable/create as would-change):
ansible-playbook -i inventory/hosts.ini site.yml --limit gpu-07.dc1.internal \
-e mig_enabled=true --check --diff
# tags, if site.yml tags the role:
ansible-playbook -i inventory/hosts.ini site.yml --limit gpu-07.dc1.internal --tags mig
Validation command and expected signal. On the node, MIG mode is Enabled and nvidia-smi -L lists one MIG-<UUID> device per compute instance:
nvidia-smi --query-gpu=index,mig.mode.current --format=csv,noheader
# expect every row: "<n>, Enabled"
nvidia-smi -L
# expect one line per slice, e.g.:
# GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-...)
# MIG 1g.10gb Device 0: (UUID: MIG-c7384736-a75d-5afc-978f-d2f1294409fd)
The MIG UUID (MIG-<...>), not the bare GPU index, is what pins a process to one instance via CUDA_VISIBLE_DEVICES. A profile that cut zero devices means GIs were created without compute instances; confirm -C (i.e. mig_create_compute_instances=true).
Idempotency check (design intent, not yet hardware-verified). Because every mutating step is guarded by a read-before-write when/changed_when (enable fires only on Disabled; create fires only when the profile is absent), a second identical run on an already-converged node is expected to report changed=0. Run it and confirm on your hardware; this role is a reference template and has not been validated on a live GPU:
ansible-playbook -i inventory/hosts.ini site.yml --limit gpu-07.dc1.internal \
-e mig_enabled=true -e mig_profile=1g.10gb
# expect PLAY RECAP -> ... changed=0 ... once the node is converged
Slot this role's validate cousin (the health role in the hub) after mig to assert geometry fleet-wide.
Failure modes¶
- Enable refused,
In use by another client. On Ampere the reset needed to enable MIG is blocked by an attached CUDA app or a straynvidia-smi. The role tolerates this stderr so a fleet run does not abort, but the GPU staysDisabled. Drain the node / kill clients (or reboot), then re-run. Runbook: stale-MIG state. - MIG layout gone after reboot (Hopper / Blackwell). Mode is not InfoROM-persistent on Hopper+; a rebooted node comes back as one whole GPU while the scheduler still expects slices, and pods stay
Pending. Re-runsite.ymlon boot (or let a systemd unit / the GPU Operator re-enable). Runbook: stale-MIG state. mig.mode.pending=Enabledbutcurrent=Disabledandmig_reboot_on_reset=false. The Ampere reset never applied; the create step then fails because MIG is not actually on. Reboot the node and re-run. Runbook: stale-MIG state.Insufficient resourceson create.mig_profileexceeds the card's slice budget (7 SM / 8 memory slices), uses a profile invalid for the SKU, or collides with placement constraints. Checknvidia-smi mig -lgipfor the remaining budget and the per-GPU tables in MIG. The role tolerates this stderr only to surface it in the verify step's failure.- Stale / partial geometry vs. what Kubernetes advertises. Device-plugin labels disagree with the on-box
nvidia-smi mig -lgilayout (typically after a partial reconfigure or a-mig 0that left CIs/GIs behind). This role does not reshape live geometry; use the drain-gated teardown in stale-MIG state. - Operator and host both managing MIG. If the GPU Operator's MIG manager owns
nvidia.com/mig.configand this role also runs, they fight over geometry. Pick one: leavemig_enabled=falsewhen the Operator is present.
References¶
- MIG User Guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
- Getting Started with MIG (
-mig 1/0,-cgi ... -C,-lgi/-lci/-lgip,-dci/-dgi,mig.mode.current/mig.mode.pending, MIG-UUID format, Ampere reset + InfoROM persistence vs. Hopper+ non-persistence,In use by another client): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/getting-started-with-mig.html - Supported MIG profiles (per-GPU profile tables, slice budget,
+me/+gfxsuffixes): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-mig-profiles.html - Supported GPUs (A30, A100, H100, H200, B200, RTX PRO Blackwell, Thor iGPU): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html
ansible.builtin.command(creates/removes, no shell, task-levelchanged_when/failed_when/register): https://docs.ansible.com/ansible/latest/collections/ansible/builtin/command_module.htmlansible.builtin.assert: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/assert_module.htmlansible.builtin.reboot: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/reboot_module.htmlansible.builtin.meta(flush_handlers): https://docs.ansible.com/ansible/latest/collections/ansible/builtin/meta_module.html- NVIDIA GPU Operator with MIG (when the Operator owns geometry instead): https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html
Related: Node & Fabric Bring-Up · MIG · Stale MIG state runbook · Software Stack · Glossary