NVIDIA fabric manager for NVSwitch systems¶
Scope: nv-fabricmanager on NVSwitch-based systems (HGX/DGX 8-GPU baseboards, GB200/GB300 NVL72): what it does, how it is versioned and run, how it fails, and how it relates to IMEX for multi-node NVLink. Not present on PCIe / non-NVSwitch parts.
What it is¶
NVIDIA Fabric Manager (FM) is a privileged userspace daemon, binary nv-fabricmanager, run via the nvidia-fabricmanager systemd service. On an NVSwitch-based system it configures the NVSwitch memory fabric so that every participating GPU forms a single NVLink domain, and then monitors the NVLinks that support that fabric. NVIDIA's wording: FM "configures the NVSwitch memory fabrics to form one memory fabric among all participating GPUs and monitors the NVLinks that support the fabric."1
Two facts drive everything operational about it:
- It is lockstep-versioned with the driver. During initialization "the FM service checks the currently loaded kernel driver stack version for compatibility, and if the loaded driver stack version is not compatible, aborts the process."1 The
nvidia-fabricmanagerpackage version and thelibnvidia-nscqlibrary version must both match the installed driver.2 - It is only for NVSwitch hardware. HGX/DGX 8-GPU baseboards and NVL72 racks carry NVSwitches; PCIe-attached datacenter cards, GeForce, and no-NVLink workstation GPUs do not, and never run
nv-fabricmanager. See NVSwitch and NVLink.
libnvidia-nscq (NVSwitch Configuration and Query) is the stable driver API that monitoring tools such as DCGM use to read NVSwitch state; it is versioned against the driver the same way FM is.2
Why it's needed (and when)¶
On an NVSwitch system the GPUs are not directly wired to each other; they are wired to NVSwitches. Until the switch crossbars and routing are programmed, there is no GPU-to-GPU NVLink path. FM is the process that does that programming at boot, then stays resident to monitor links and handle the CUDA job lifecycle against the fabric.
When you need it:
- Single-node NVSwitch (HGX/DGX A100, H100/H200, B200/B300): FM on the node, full stop. No FM, no NVLink mesh.
- Multi-node NVLink (GB200/GB300 NVL72): FM still runs per node to program that node's switches; on top of it the IMEX service coordinates the NVLink memory domain across nodes. IMEX does not replace FM: FM manages the physical fabric on each node, IMEX orchestrates cross-node memory export/import. See the IMEX subsection under "How it's installed & managed".
When you do not need it: any system with no NVSwitch. Installing or enabling FM there is pointless and the service will not initialize a fabric.
On Hopper/Blackwell, NVSwitch supports ALI (Autonomous Link Initialization), so links can come up without FM driving every step, but FM is still the authority for fabric membership. NVIDIA's consequence statement: "If a GPU fails to register with the fabric, it will lose its NVLink peer-to-peer capability and be available for non-peer-to-peer use cases."1 That is the quiet-degradation case: the GPU still runs, it just has no NVLink peers.
How it's installed & managed¶
FM ships from the NVIDIA CUDA network repository, paired to a driver branch. Replace <driver-branch> with the branch you have pinned the fleet to (see Driver Versions and Branches); FM and driver must be the same version.
Reference template, not hardware-tested.
Ubuntu / Debian, pre-4th-gen NVSwitch (A100, H100/H200, which use a single combined package):1
Ubuntu / Debian, 4th-gen NVSwitch (B200/B300/B100) install the open driver plus the NVLink5 stack:1
RHEL 8/9, pre-4th-gen, via the driver module's fm profile:1
The package drops the unit file at /lib/systemd/system/nvidia-fabricmanager.service, but does not enable or start it; that is left to the administrator.1 Enable for boot, then start and check:
sudo systemctl enable nvidia-fabricmanager
sudo systemctl start nvidia-fabricmanager
sudo systemctl status nvidia-fabricmanager
Logs and config:
- Service journal:
sudo journalctl -u nvidia-fabricmanager1 - Log file:
/var/log/fabricmanager.log(keyLOG_FILE_NAMEin the config)1 - Config file:
/usr/share/nvidia/nvswitch/fabricmanager.cfg1
Config keys that matter operationally (in fabricmanager.cfg):1
FABRIC_MODE:0bare-metal or full passthrough virtualization;1Shared NVSwitch multi-tenancy;2vGPU multi-tenancy. Bare-metal training/inference clusters use0.FM_STAY_RESIDENT_ON_FAILURES:0FM exits on NVSwitch/GPU config failure;1FM stays resident on such failures, "However, the system will be uninitialized, and the CUDA application launch will fail." Staying resident keeps the daemon up for inspection but does not make the fabric usable.TOPOLOGY_FILE_PATH=/usr/share/nvidia/nvswitch: fabric topology location; NVIDIA notes this is "not applicable to DGX B200/B300 and NVIDIA HGX B200/B300 and later NVSwitch-based systems."1
Hold FM at the driver's version across the fleet. A driver upgrade that leaves FM (or libnvidia-nscq) on the old version is the canonical mismatch failure; handle it in the driver-upgrade runbook.
IMEX (multi-node NVLink: GB200/GB300 NVL72)¶
IMEX (NVIDIA Import/Export Service for Internode Memory Sharing) is a separate service that, on a multi-node NVLink cluster, "acts as an orchestrator for memory export and import across compute nodes" and runs exclusively on the compute nodes.3 An IMEX domain is "a set of compute nodes connected by NVLink on which the nvidia-imex service has been installed and configured to communicate with each other via the nodes_config.cfg."3 Start IMEX before launching jobs.
Reference template, not hardware-tested.
Install (Ubuntu/Debian) and run, matched to the same driver branch:4
sudo apt-get install nvidia-imex-<driver-branch>
sudo systemctl enable nvidia-imex
sudo systemctl start nvidia-imex
sudo systemctl status nvidia-imex
Paths and tooling:
- Service:
nvidia-imex(unit/lib/systemd/system/nvidia-imex.service), enabled/started manually like FM.4 - Config dir:
/etc/nvidia-imex/; main config/etc/nvidia-imex/config.cfg;4 node listIMEX_NODE_CONFIG_FILE=/etc/nvidia-imex/nodes_config.cfg(the IPs of the domain's nodes).5 nvidia-imex-ctlqueries the state of the IMEX service; it takes-c <config path>(default/etc/nvidia-imex/config.cfg).6
For GB200/GB300, getting NVLink across the rack means both layers healthy: FM on every node and IMEX across the domain. A clean FM with a broken/absent IMEX leaves intra-node NVLink working but no cross-node NVLink memory sharing.
Validated usage & tests¶
Reference template, not hardware-tested. Run these on an actual NVSwitch node; the descriptions below are the expected shape of healthy output, not numbers measured on hardware.
Confirm the service is active and has not aborted on a version check:
Expect active (running). A daemon that started then exited, or that logs a driver-compatibility abort, is the mismatch signature; cross-check the FM package version against the running driver:
nvidia-smi --query-gpu=driver_version --format=csv,noheader
dpkg -l 'nvidia-fabricmanager*' 'libnvidia-nscq*' # Debian/Ubuntu
# rpm -qa 'nvidia-fabric-manager*' 'libnvidia-nscq*' # RHEL
The driver version and the FM / NSCQ package versions must agree.2 If they do not, that is the bug, not NVLink or NCCL.
Tail the FM log around a (re)start to see initialization progress and any link or GPU-registration errors:
Healthy startup logs fabric initialization completing with all expected GPUs and NVSwitches registered; a failure logs which GPU or switch failed to register (recall: a GPU that fails to register loses NVLink P2P1).
Once FM is up, inspect NVLink topology and link state with nvidia-smi rather than FM itself (nvidia-smi Reference):
Expect every NVLink reporting active with its per-lane rate, and the topology matrix showing GPU pairs connected via NV# (NVSwitch/NVLink) rather than falling back to PHB/SYS (PCIe host bridge / system) paths. PCIe-only paths between GPUs on an NVSwitch box mean the fabric is not formed.
On multi-node NVL72, additionally confirm IMEX state:
Expect the service active (running) and nvidia-imex-ctl reporting the IMEX domain nodes reachable and the service ready. (NVIDIA documents nvidia-imex-ctl as the state-query tool but does not publish a fixed "healthy" string; treat reachable-all-nodes plus an active unit as the bar.6)
Note on access control: if FM/NVLSM are configured to run under a specific user or group, NVIDIA states the nvswitch-audit utility "should be started from the same user/user group account."1 Its detailed function is not documented in the FM user guide; verify against the Fabric Manager user guide before relying on it.
Failure modes¶
- FM / driver (or NSCQ) version mismatch. FM aborts on the compatibility check at init; the fabric never forms; new CUDA jobs fail with
cudaErrorSystemNotReady.1 Most common after a driver upgrade that left FM behind. Full diagnosis and fix: Fabric Manager Failure; upgrade ordering in the driver-upgrade runbook. - FM down / not enabled. No NVLink mesh; collectives that assumed NVLink either error or fall back to PCIe and run far slower. Check FM before blaming NVLink or NCCL. See Fabric Manager Failure.
- Single GPU fails to register with the fabric. That GPU loses NVLink peer-to-peer and is "available for non-peer-to-peer use cases" only1, a silent partial degradation rather than a hard stop.
FM_STAY_RESIDENT_ON_FAILURES=1masking a dead fabric. The daemon showsactivebut "the system will be uninitialized, and the CUDA application launch will fail."1 A greensystemctl statusis not proof the fabric is up; confirm with the CUDA/NVLink checks above.- Multi-node: IMEX down on NVL72. FM healthy per node but no cross-node NVLink memory domain; intra-node NVLink works, inter-node does not. Start IMEX before jobs.3
References¶
- NVIDIA Fabric Manager User Guide: https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
- Fabric Manager packaging for Debian (package names): https://github.com/NVIDIA/apt-packaging-fabric-manager
- NSCQ packaging for Debian (
libnvidia-nscq): https://github.com/NVIDIA/apt-packaging-libnvidia-nscq - NVIDIA IMEX Service — Overview: https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html
- NVIDIA IMEX Service — Getting Started (install/run): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/gettingstarted.html
- NVIDIA IMEX Service — Config Options (
IMEX_NODE_CONFIG_FILE): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html - NVIDIA IMEX Service — Command-line Tools (
nvidia-imex-ctl): https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/cmdservice.html - FM/driver version-match note (field write-up): https://tonyseah.medium.com/fixing-nvidia-fabric-manager-driver-mismatch-on-aws-p4d-ubuntu-24-04-a100-bacc6bb6ede9
Related: NVSwitch and NVLink · Driver Versions and Branches · Fabric Manager Failure · Glossary
-
NVIDIA Fabric Manager User Guide — service function, version compatibility abort, FABRIC_MODE / FM_STAY_RESIDENT_ON_FAILURES / TOPOLOGY_FILE_PATH config keys, paths, package/systemctl commands, ALI registration behavior,
cudaErrorSystemNotReady,nvswitch-audituser/group note. https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html ↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩↩ -
FM and
libnvidia-nscqmust match the installed driver version; NSCQ as the stable NVSwitch query API used by DCGM. https://github.com/NVIDIA/apt-packaging-libnvidia-nscq and https://tonyseah.medium.com/fixing-nvidia-fabric-manager-driver-mismatch-on-aws-p4d-ubuntu-24-04-a100-bacc6bb6ede9 ↩↩↩ -
NVIDIA IMEX Service Overview — IMEX acts as an orchestrator for memory export and import across compute nodes and runs on the compute nodes; an IMEX domain is the set of NVLink-connected compute nodes running and configured for the
nvidia-imexservice. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html ↩↩↩ -
NVIDIA IMEX Service Getting Started — install package
nvidia-imex-<driver-branch>, service registered asnvidia-imex, unit/lib/systemd/system/nvidia-imex.service,/etc/nvidia-imex/config dir,config.cfg, systemctl enable/start/status. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/gettingstarted.html ↩↩↩ -
NVIDIA IMEX Service Config Options —
IMEX_NODE_CONFIG_FILE, default/etc/nvidia-imex/nodes_config.cfg(file of node IP addresses in the IMEX domain). https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/config.html ↩ -
NVIDIA IMEX Service Command-line Tools —
nvidia-imex-ctlqueries the state of one or all IMEX instances;-c <config path>(default/etc/nvidia-imex/config.cfg),-Nfull-domain status,-ncontinuous monitor. https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/cmdservice.html ↩↩