Markdown

MIG (multi-instance GPU)¶

Scope: hardware partitioning of a single datacenter/workstation GPU into isolated GPU instances: profiles, the nvidia-smi mig lifecycle, isolation guarantees, supported silicon, and how Kubernetes consumes the result. Sits below the platform (software stack) and beside the soft-partitioning alternative (MPS).

What it is¶

MIG spatially partitions one physical GPU into up to seven independent GPU instances (GI), each with dedicated streaming-multiprocessor (SM) slices, dedicated framebuffer memory and L2 cache, and dedicated memory bandwidth. Unlike time-slicing or MPS (where clients contend for the same SMs and memory), MIG gives each instance a separate hardware path through the memory system, so one tenant cannot starve, corrupt, or crash another.

Two-level hierarchy:

GPU instance (GI): a set of GPU memory slices plus SM slices and dedicated engines (DMA/copy engines, NVDEC, JPEG, OFA). A memory slice is ~1/8 of total framebuffer (capacity and bandwidth); an SM slice is ~1/7 of SMs. The GI is the unit of memory QoS and fault isolation.¹
Compute instance (CI): a subdivision of a GI's SM slices. CIs inside one GI share that GI's memory and engines but get isolated compute. A GI created with -C gets one CI spanning its full SM slice count (the common case).¹

Profile naming encodes the shape: <SM-slices>g.<memory-GB>gb. 3g.20gb = 3 SM slices, 20 GB. 7g.80gb = the whole A100/H100 as a single instance. Suffixes: +me adds the media engines (NVDEC/JPEG/OFA) to a 1-slice profile; on Blackwell, +gfx enables graphics APIs, -me strips media engines for compute-only, +me.all grabs all media engines.³

Isolation guarantees, per NVIDIA: "each instance's processors have separate and isolated paths through the entire memory system: the on-chip crossbar ports, L2 cache banks, memory controllers, and DRAM address busses are all assigned uniquely to an individual instance."² This is error/fault containment, not just scheduling fairness. It is the basis for MIG's use in regulated multi-tenancy.

Why it's needed (and when)¶

Right-sizing. A full A100/H100/H200/B200 is wasteful for inference of small models, notebooks, CI jobs, or per-developer dev environments. MIG lets one card present as up to 7 smaller, guaranteed-capacity GPUs.
Hard isolation between tenants. Memory-bandwidth QoS and fault isolation mean a misbehaving job (OOM, an uncorrectable ECC event scoped to its slice, a runaway kernel) is contained to its instance. This is the distinguishing reason to pick MIG over MPS, which shares one address space and offers no fault isolation.
Predictable latency. No noisy-neighbour SM/memory contention, so p99 latency for inference is stable under co-tenancy.

When not to use MIG:

You need the full card for one large model, or model/tensor-parallel across the whole GPU. MIG instances cannot span partitions and have no NVLink between instances (no peer-to-peer across GIs). Multi-GPU collectives (NVSwitch/NVLink) operate on whole GPUs, not MIG slices.
You need elastic sharing without fixed boundaries and can tolerate contention; use MPS or time-slicing instead.
Your fleet is consumer GeForce; MIG is not supported there (generations/tiers).

How it's installed & managed¶

MIG is a driver/firmware capability of the GPU, not a separate package; it ships with the datacenter driver. No extra install beyond a MIG-capable driver and a supported GPU⁵. Everything below is driven through nvidia-smi mig. Commands require root (or a user granted the mig/config capability).⁴

Reference templates, not hardware-tested. Validate every command against the MIG User Guide and your driver release before running in production.

Enable MIG mode (per GPU, by index):

sudo nvidia-smi -i 0 -mig 1
# Enabled MIG Mode for GPU 00000000:36:00.0

Architecture difference that bites operators:

Ampere (A100/A30): enabling MIG triggers a GPU reset; the mode is persistent across reboots (a status bit is stored in the GPU InfoROM). If clients are attached the reset is refused: In use by another client ... Please first kill all processes using the device and retry the command or reboot the system.⁴
Hopper and newer (H100/H200/B-series): no reset is required to toggle MIG, but the mode is no longer persistent across reboots (no InfoROM status bit); it must be re-enabled on boot, e.g. via the GPU Operator, a systemd unit, or your provisioning (install lifecycle).⁴

List available profiles (note the numeric Profile IDs in the first column, used by -cgi):

sudo nvidia-smi mig -lgip

Create GPU instances with their compute instances in one step (-C). Accepts profile IDs and/or names; order/placement is assigned by the driver:

# By profile name
sudo nvidia-smi mig -cgi 9,3g.20gb -C
# By profile ID (e.g. 19,14,5 from -lgip)
sudo nvidia-smi mig -cgi 19,14,5 -C

-cgi alone creates only the GI; add -C to also create the CI so the slice is usable by CUDA. Without a CI, CUDA enumerates nothing.⁴

List what exists:

sudo nvidia-smi mig -lgi            # GPU instances
sudo nvidia-smi mig -lci -gi 1      # compute instances under GI 1
nvidia-smi -L                       # CUDA-visible MIG devices + UUIDs

nvidia-smi -L is how you get the addressable device identity:

MIG 3g.20gb Device 0: (UUID: MIG-c7384736-a75d-5afc-978f-d2f1294409fd)

The MIG UUID (MIG-<...>), not the bare GPU index, is what you pass to CUDA_VISIBLE_DEVICES to pin a process to one instance.⁴

Destroy (compute instances first, then GPU instances; a GI with live CIs will not delete):

sudo nvidia-smi mig -dci && sudo nvidia-smi mig -dgi

Disable MIG entirely:

sudo nvidia-smi -i 0 -mig 0

Operational notes:

Persistence mode. Keep persistence mode on so the driver state and MIG layout survive the last client exiting; otherwise the driver tears down on idle and re-initialises on next use.
ECC. MIG-capable datacenter GPUs run with ECC on; toggling ECC requires a GPU reset, so sequence it before laying out MIG (see ECC Toggle Recovery).
Not all profiles co-exist. A GPU has a fixed slice budget (7 SM slices, 8 memory slices). You cannot mix profiles that exceed it; -lgip shows the max count of each, and creating a large profile reduces what remains. Placement constraints mean some combinations are invalid even when total slices fit.³

Profiles by GPU¶

All profile data below is from NVIDIA's Supported MIG Profiles tables.³ +me rows are single-instance media variants; counts are the maximum number of that profile alone.

H100 80GB / H200 follow the 1/7-SM, 1/8-memory model (max 7 GIs):

GPU	Smallest	Largest	Example profiles	Max GIs
A100-SXM4-40GB	`1g.5gb`	`7g.40gb`	`1g.5gb`, `1g.10gb`, `2g.10gb`, `3g.20gb`, `4g.20gb`	7
A100 80GB	`1g.10gb`	`7g.80gb`	`1g.10gb`, `1g.20gb`, `2g.20gb`, `3g.40gb`	7
H100 80GB	`1g.10gb`	`7g.80gb`	`1g.10gb`, `1g.20gb`, `2g.20gb`, `3g.40gb`, `4g.40gb`	7
H200 141GB	`1g.18gb`	`7g.141gb`	`1g.18gb`, `1g.35gb`, `2g.35gb`, `3g.71gb`, `4g.71gb`	7
B200 180GB	`1g.23gb`	`7g.180gb`	`1g.23gb`, `1g.45gb`, `2g.45gb`, `3g.90gb`, `4g.90gb`	7

RTX PRO 6000 Blackwell (96GB) is the exception: it partitions at 1/4 granularity (max 4 instances, not 7), and is the workstation tier that adds graphics (+gfx) profiles:³

Profile	Memory	SMs	Max instances
`1g.24gb`	1/4	1/4	4
`1g.24gb+gfx`	1/4	1/4	4
`2g.48gb`	1/2	1/2	2
`4g.96gb`	Full	Full	1

Supported MIG silicon per NVIDIA's tables: A30, A100, H100, H200, B200, RTX PRO 6000 / 5000 / 4500 Blackwell, and Thor iGPU. Consumer GeForce is excluded; check the supported-GPUs page for your exact SKU before planning.⁵

Kubernetes consumption¶

The NVIDIA device plugin exposes MIG via a node-level MIG_STRATEGY:⁶

single: every GPU on the node is partitioned into the same profile. MIG devices are advertised as the familiar nvidia.com/gpu resource, so pod specs are unchanged; only the meaning of one "GPU" shrinks to one MIG slice. Best for homogeneous nodes.
mixed: the node may expose different profiles (and whole non-MIG GPUs). Each profile becomes its own extended resource, nvidia.com/mig-<slices>g.<mem>gb (e.g. nvidia.com/mig-3g.20gb), requested explicitly in the pod spec. NVIDIA recommends mixed for new deployments. Caveat: by default a container should request a single device type; requesting more than one is undefined.⁶

# mixed strategy: request one 3g.20gb MIG slice
resources:
  limits:
    nvidia.com/mig-3g.20gb: 1

GPU Feature Discovery labels nodes with their MIG geometry; the device plugin + GPU Operator can also reconfigure layouts. The partition geometry must exist on the node (created via nvidia-smi mig or the Operator's MIG manager) before the plugin can advertise it.⁶

flowchart LR
  GPU["Physical GPU, e.g. H100 80GB"] --> EN["nvidia-smi -i 0 -mig 1"]
  EN --> CGI["nvidia-smi mig -cgi 9,3g.20gb -C"]
  CGI --> GI["GPU instances: memory + SM + L2 isolated"]
  GI --> CI["Compute instances: per-GI SM subset"]
  CI --> DEV["MIG-UUID devices in nvidia-smi -L"]
  DEV --> K8S["Device plugin: single uses nvidia.com/gpu, mixed uses nvidia.com/mig-3g.20gb"]

Validated usage & tests¶

Reference templates, not hardware-tested. The outputs below describe shape, not specific numbers.

Confirm MIG is enabled and laid out. nvidia-smi shows a MIG devices: table under the GPU with one row per instance:

nvidia-smi

Expect a per-instance table listing each GI/CI, its device index, the profile name, and per-instance memory. The top-of-output memory figure reflects the parent GPU; per-instance memory appears in the MIG table.

Enumerate addressable instances:

nvidia-smi -L

Expect one MIG <profile> Device N: (UUID: MIG-<...>) line per compute instance. If this lists the GPU but no MIG devices, you created GIs without CIs; re-run -cgi ... -C or add CIs with -cci.

Pin a workload to one instance and prove isolation:

CUDA_VISIBLE_DEVICES=MIG-c7384736-a75d-5afc-978f-d2f1294409fd python -c "import torch; print(torch.cuda.get_device_name(0))"

Expect the process to see exactly one device sized to that profile. Saturating one instance should not change throughput on a sibling instance running concurrently, the observable signature of memory-bandwidth QoS.

Monitor utilisation per instance. Per-instance SM/memory utilisation is exposed through DCGM (diagnostics tools); plain nvidia-smi utilisation columns are limited under MIG, so use DCGM for per-GI metrics in a fleet.

Failure modes¶

MIG layout gone after reboot (Hopper+). Mode is not InfoROM-persistent on Hopper/Blackwell; instances must be recreated on boot. Symptom: a node that had MIG slices comes back as one whole GPU and pods stay Pending. See Stale MIG State.
Stale/partial geometry vs. what Kubernetes advertises. Device-plugin labels or advertised resources disagree with the actual nvidia-smi mig -lgi layout (often after a partial reconfigure or a -mig 0 without cleaning CIs/GIs). See Stale MIG State.
Enable refused: In use by another client. On Ampere the reset needed to enable MIG is blocked by attached processes; drain the node / kill clients (or reboot) first.⁴
-dgi fails because CIs still exist. Destroy compute instances before GPU instances: -dci then -dgi.
Expecting NVLink/P2P between MIG slices. There is none; collectives and model parallelism that need peer-to-peer must run on whole GPUs (NVSwitch/NVLink).

References¶

MIG User Guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
MIG introduction (isolated memory-system paths): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/introduction.html
MIG concepts (GI/CI, slices): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/concepts.html
Getting started with MIG (enable, create, destroy, UUIDs): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/getting-started-with-mig.html
Supported MIG profiles (A100/H100/H200/B200/RTX PRO Blackwell tables): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-mig-profiles.html
Supported GPUs: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html
MIG support in Kubernetes (single/mixed strategy): https://docs.nvidia.com/datacenter/cloud-native/kubernetes/latest/index.html
NVIDIA k8s-device-plugin: https://github.com/NVIDIA/k8s-device-plugin

MIG User Guide, Concepts — GPU/compute instances, memory/SM slices (~1/8 memory, ~1/7 SMs). https://docs.nvidia.com/datacenter/tesla/mig-user-guide/concepts.html ↩↩
MIG User Guide, Introduction — the fault-isolation statement on separate, isolated memory-system paths (crossbar ports, L2 cache banks, memory controllers, DRAM address busses). https://docs.nvidia.com/datacenter/tesla/mig-user-guide/introduction.html ↩
MIG User Guide, Supported MIG Profiles — per-GPU profile tables and +me/+gfx/-me/+me.all suffix semantics. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-mig-profiles.html ↩↩↩↩
MIG User Guide, Getting Started — -mig 1, Ampere reset + InfoROM persistence vs. Hopper+ non-persistence, -cgi ... -C, -lgi/-lci, -dci/-dgi, MIG UUID enumeration, In use by another client reset refusal. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/getting-started-with-mig.html ↩↩↩↩↩↩
MIG User Guide, Supported GPUs — A30, A100, H100, H200, B200, RTX PRO 6000/5000/4500 Blackwell, Thor iGPU. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html ↩↩
NVIDIA Kubernetes documentation and k8s-device-plugin — MIG_STRATEGY single (nvidia.com/gpu) vs. mixed (nvidia.com/mig-<slices>g.<mem>gb), and the single-device-type-per-container caveat. https://github.com/NVIDIA/k8s-device-plugin ↩↩↩