Markdown

NVIDIA fabric manager for NVSwitch systems¶

Scope: nv-fabricmanager on NVSwitch-based systems (HGX/DGX 8-GPU baseboards, GB200/GB300 NVL72): what it does, how it is versioned and run, how it fails, and how it relates to IMEX for multi-node NVLink. Not present on PCIe / non-NVSwitch parts.

What it is¶

NVIDIA Fabric Manager (FM) is a privileged userspace daemon, binary nv-fabricmanager, run via the nvidia-fabricmanager systemd service. On an NVSwitch-based system it configures the NVSwitch memory fabric so that every participating GPU forms a single NVLink domain, and then monitors the NVLinks that support that fabric. NVIDIA's wording: FM "configures the NVSwitch memory fabrics to form one memory fabric among all participating GPUs and monitors the NVLinks that support the fabric."¹

Two facts drive everything operational about it:

It is lockstep-versioned with the driver. During initialization "the FM service checks the currently loaded kernel driver stack version for compatibility, and if the loaded driver stack version is not compatible, aborts the process."¹ The nvidia-fabricmanager package version and the libnvidia-nscq library version must both match the installed driver.²
It is only for NVSwitch hardware. HGX/DGX 8-GPU baseboards and NVL72 racks carry NVSwitches; PCIe-attached datacenter cards, GeForce, and no-NVLink workstation GPUs do not, and never run nv-fabricmanager. See NVSwitch and NVLink.

libnvidia-nscq (NVSwitch Configuration and Query) is the stable driver API that monitoring tools such as DCGM use to read NVSwitch state; it is versioned against the driver the same way FM is.²

Why it's needed (and when)¶

On an NVSwitch system the GPUs are not directly wired to each other; they are wired to NVSwitches. Until the switch crossbars and routing are programmed, there is no GPU-to-GPU NVLink path. FM is the process that does that programming at boot, then stays resident to monitor links and handle the CUDA job lifecycle against the fabric.

When you need it:

Single-node NVSwitch (HGX/DGX A100, H100/H200, B200/B300): FM on the node, full stop. No FM, no NVLink mesh.
Multi-node NVLink (GB200/GB300 NVL72): FM still runs per node to program that node's switches; on top of it the IMEX service coordinates the NVLink memory domain across nodes. IMEX does not replace FM: FM manages the physical fabric on each node, IMEX orchestrates cross-node memory export/import. See the IMEX subsection under "How it's installed & managed".

When you do not need it: any system with no NVSwitch. Installing or enabling FM there is pointless and the service will not initialize a fabric.

On Hopper/Blackwell, NVSwitch supports ALI (Autonomous Link Initialization), so links can come up without FM driving every step, but FM is still the authority for fabric membership. NVIDIA's consequence statement: "If a GPU fails to register with the fabric, it will lose its NVLink peer-to-peer capability and be available for non-peer-to-peer use cases."¹ That is the quiet-degradation case: the GPU still runs, it just has no NVLink peers.

How it's installed & managed¶

FM ships from the NVIDIA CUDA network repository, paired to a driver branch. Replace <driver-branch> with the branch you have pinned the fleet to (see Driver Versions and Branches); FM and driver must be the same version.

Reference template, not hardware-tested.

Ubuntu / Debian, pre-4th-gen NVSwitch (A100, H100/H200, which use a single combined package):¹

sudo apt-get install cuda-drivers-fabricmanager-<driver-branch>

Ubuntu / Debian, 4th-gen NVSwitch (B200/B300/B100) install the open driver plus the NVLink5 stack:¹

sudo apt-get install -V nvidia-open-<driver-branch>
sudo apt-get install -V nvlink5-<driver-branch>

RHEL 8/9, pre-4th-gen, via the driver module's fm profile:¹

sudo dnf module install nvidia-driver:<driver-branch>/fm

The package drops the unit file at /lib/systemd/system/nvidia-fabricmanager.service, but does not enable or start it; that is left to the administrator.¹ Enable for boot, then start and check:

sudo systemctl enable nvidia-fabricmanager
sudo systemctl start  nvidia-fabricmanager
sudo systemctl status nvidia-fabricmanager

Logs and config:

Service journal: sudo journalctl -u nvidia-fabricmanager¹
Log file: /var/log/fabricmanager.log (key LOG_FILE_NAME in the config)¹
Config file: /usr/share/nvidia/nvswitch/fabricmanager.cfg¹

Config keys that matter operationally (in fabricmanager.cfg):¹

FABRIC_MODE: 0 bare-metal or full passthrough virtualization; 1 Shared NVSwitch multi-tenancy; 2 vGPU multi-tenancy. Bare-metal training/inference clusters use 0.
FM_STAY_RESIDENT_ON_FAILURES: 0 FM exits on NVSwitch/GPU config failure; 1 FM stays resident on such failures, "However, the system will be uninitialized, and the CUDA application launch will fail." Staying resident keeps the daemon up for inspection but does not make the fabric usable.
TOPOLOGY_FILE_PATH=/usr/share/nvidia/nvswitch: fabric topology location; NVIDIA notes this is "not applicable to DGX B200/B300 and NVIDIA HGX B200/B300 and later NVSwitch-based systems."¹

Hold FM at the driver's version across the fleet. A driver upgrade that leaves FM (or libnvidia-nscq) on the old version is the canonical mismatch failure; handle it in the driver-upgrade runbook.

IMEX (multi-node NVLink: GB200/GB300 NVL72)¶

IMEX (NVIDIA Import/Export Service for Internode Memory Sharing) is a separate service that, on a multi-node NVLink cluster, "acts as an orchestrator for memory export and import across compute nodes" and runs exclusively on the compute nodes.³ An IMEX domain is "a set of compute nodes connected by NVLink on which the nvidia-imex service has been installed and configured to communicate with each other via the nodes_config.cfg."³ Start IMEX before launching jobs.

Reference template, not hardware-tested.

Install (Ubuntu/Debian) and run, matched to the same driver branch:⁴

sudo apt-get install nvidia-imex-<driver-branch>
sudo systemctl enable nvidia-imex
sudo systemctl start  nvidia-imex
sudo systemctl status nvidia-imex

Paths and tooling:

Service: nvidia-imex (unit /lib/systemd/system/nvidia-imex.service), enabled/started manually like FM.⁴
Config dir: /etc/nvidia-imex/; main config /etc/nvidia-imex/config.cfg;⁴ node list IMEX_NODE_CONFIG_FILE=/etc/nvidia-imex/nodes_config.cfg (the IPs of the domain's nodes).⁵
nvidia-imex-ctl queries the state of the IMEX service; it takes -c <config path> (default /etc/nvidia-imex/config.cfg).⁶

For GB200/GB300, getting NVLink across the rack means both layers healthy: FM on every node and IMEX across the domain. A clean FM with a broken/absent IMEX leaves intra-node NVLink working but no cross-node NVLink memory sharing.

Validated usage & tests¶

Reference template, not hardware-tested. Run these on an actual NVSwitch node; the descriptions below are the expected shape of healthy output, not numbers measured on hardware.

Confirm the service is active and has not aborted on a version check:

systemctl is-active nvidia-fabricmanager
sudo systemctl status nvidia-fabricmanager

Expect active (running). A daemon that started then exited, or that logs a driver-compatibility abort, is the mismatch signature; cross-check the FM package version against the running driver:

nvidia-smi --query-gpu=driver_version --format=csv,noheader
dpkg -l 'nvidia-fabricmanager*' 'libnvidia-nscq*'   # Debian/Ubuntu
# rpm -qa 'nvidia-fabric-manager*' 'libnvidia-nscq*' # RHEL

The driver version and the FM / NSCQ package versions must agree.² If they do not, that is the bug, not NVLink or NCCL.

Tail the FM log around a (re)start to see initialization progress and any link or GPU-registration errors:

sudo journalctl -u nvidia-fabricmanager -b --no-pager
sudo tail -n 100 /var/log/fabricmanager.log

Healthy startup logs fabric initialization completing with all expected GPUs and NVSwitches registered; a failure logs which GPU or switch failed to register (recall: a GPU that fails to register loses NVLink P2P¹).

Once FM is up, inspect NVLink topology and link state with nvidia-smi rather than FM itself (nvidia-smi Reference):

nvidia-smi nvlink --status
nvidia-smi topo --matrix

Expect every NVLink reporting active with its per-lane rate, and the topology matrix showing GPU pairs connected via NV# (NVSwitch/NVLink) rather than falling back to PHB/SYS (PCIe host bridge / system) paths. PCIe-only paths between GPUs on an NVSwitch box mean the fabric is not formed.

On multi-node NVL72, additionally confirm IMEX state:

sudo systemctl status nvidia-imex
nvidia-imex-ctl -N -c /etc/nvidia-imex/config.cfg

Expect the service active (running) and nvidia-imex-ctl reporting the IMEX domain nodes reachable and the service ready. (NVIDIA documents nvidia-imex-ctl as the state-query tool but does not publish a fixed "healthy" string; treat reachable-all-nodes plus an active unit as the bar.⁶)

Note on access control: if FM/NVLSM are configured to run under a specific user or group, NVIDIA states the nvswitch-audit utility "should be started from the same user/user group account."¹ Its detailed function is not documented in the FM user guide; verify against the Fabric Manager user guide before relying on it.

Failure modes¶

FM / driver (or NSCQ) version mismatch. FM aborts on the compatibility check at init; the fabric never forms; new CUDA jobs fail with cudaErrorSystemNotReady.¹ Most common after a driver upgrade that left FM behind. Full diagnosis and fix: Fabric Manager Failure; upgrade ordering in the driver-upgrade runbook.
FM down / not enabled. No NVLink mesh; collectives that assumed NVLink either error or fall back to PCIe and run far slower. Check FM before blaming NVLink or NCCL. See Fabric Manager Failure.
Single GPU fails to register with the fabric. That GPU loses NVLink peer-to-peer and is "available for non-peer-to-peer use cases" only¹, a silent partial degradation rather than a hard stop.
FM_STAY_RESIDENT_ON_FAILURES=1 masking a dead fabric. The daemon shows active but "the system will be uninitialized, and the CUDA application launch will fail."¹ A green systemctl status is not proof the fabric is up; confirm with the CUDA/NVLink checks above.
Multi-node: IMEX down on NVL72. FM healthy per node but no cross-node NVLink memory domain; intra-node NVLink works, inter-node does not. Start IMEX before jobs.³

References¶

NVIDIA Fabric Manager User Guide: https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
Fabric Manager packaging for Debian (package names): https://github.com/NVIDIA/apt-packaging-fabric-manager
NSCQ packaging for Debian (libnvidia-nscq): https://github.com/NVIDIA/apt-packaging-libnvidia-nscq
NVIDIA IMEX Service — Overview: https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html
NVIDIA IMEX Service — Getting Started (install/run): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/gettingstarted.html
NVIDIA IMEX Service — Config Options (IMEX_NODE_CONFIG_FILE): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html
NVIDIA IMEX Service — Command-line Tools (nvidia-imex-ctl): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/cmdservice.html
FM/driver version-match note (field write-up): https://tonyseah.medium.com/fixing-nvidia-fabric-manager-driver-mismatch-on-aws-p4d-ubuntu-24-04-a100-bacc6bb6ede9

NVIDIA Fabric Manager User Guide — service function, version compatibility abort, FABRIC_MODE / FM_STAY_RESIDENT_ON_FAILURES / TOPOLOGY_FILE_PATH config keys, paths, package/systemctl commands, ALI registration behavior, cudaErrorSystemNotReady, nvswitch-audit user/group note. https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html ↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩
FM and libnvidia-nscq must match the installed driver version; NSCQ as the stable NVSwitch query API used by DCGM. https://github.com/NVIDIA/apt-packaging-libnvidia-nscq and https://tonyseah.medium.com/fixing-nvidia-fabric-manager-driver-mismatch-on-aws-p4d-ubuntu-24-04-a100-bacc6bb6ede9 ↩↩↩
NVIDIA IMEX Service Overview — IMEX acts as an orchestrator for memory export and import across compute nodes and runs on the compute nodes; an IMEX domain is the set of NVLink-connected compute nodes running and configured for the nvidia-imex service. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html ↩↩↩
NVIDIA IMEX Service Getting Started — install package nvidia-imex-<driver-branch>, service registered as nvidia-imex, unit /lib/systemd/system/nvidia-imex.service, /etc/nvidia-imex/ config dir, config.cfg, systemctl enable/start/status. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/gettingstarted.html ↩↩↩
NVIDIA IMEX Service Config Options — IMEX_NODE_CONFIG_FILE, default /etc/nvidia-imex/nodes_config.cfg (file of node IP addresses in the IMEX domain). https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html ↩
NVIDIA IMEX Service Command-line Tools — nvidia-imex-ctl queries the state of one or all IMEX instances; -c <config path> (default /etc/nvidia-imex/config.cfg), -N full-domain status, -n continuous monitor. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/cmdservice.html ↩↩