Markdown

Driver & feature support by GPU tier¶

Scope: which of NVIDIA's four driver families a given GPU class runs (datacenter/Tesla, GeForce, RTX Enterprise Production Branch, DGX OS) and which platform features (MIG, vGPU, ECC, persistence, Confidential Computing, Fabric Manager) actually exist on that tier. This is the tier-to-capability map that gates every other decision in the KB: branch policy (Driver Versions and Branches), install path (Driver Install and Lifecycle), partitioning (MIG), and isolation (Security, Isolation and Multi-tenancy).

Reference template on real NVIDIA driver products, EULA text, and feature matrices as of June 2026. Not hardware-tested. Feature availability shifts per SKU and driver release, so re-verify against the cited NVIDIA pages before relying on a single cell.

What it is¶

NVIDIA ships four distinct driver families, not one driver with options. The family is determined by the GPU class and carries a different licence, support lifecycle, and feature surface:

Datacenter (Tesla) driver, for datacenter GPUs (A100, H100/H200, B200/B300, L40S). Shipped as numbered Production / LTS branches (R580-class as of mid-2026) with a published 1-yr/3-yr lifecycle, licensed for datacenter use. This is the family the rest of the KB assumes; branch mechanics are Driver Versions and Branches.¹
GeForce driver (Game Ready / Studio), for consumer GeForce RTX 50/40. The NVIDIA Driver License Agreement §2.8 states GeForce/Titan software "(i) is licensed for use only on GeForce or Titan hardware products you own, and (ii) is not licensed for datacenter deployment."² This is a contractual bar, not a technical lockout. The silicon runs, but running the GeForce driver in a datacenter breaches the licence. See RTX Consumer and Workstation GPUs.
RTX Enterprise Production Branch, for professional RTX PRO / workstation parts (RTX PRO 6000 Blackwell, RTX 6000 Ada). A rebrand of the Quadro Optimal Driver for Enterprise (ODE), carrying ISV certification, long-lifecycle support, and regular security updates, with no datacenter restriction.³ The professional analogue of the datacenter LTS branch.
DGX OS is not a bare driver but the validated OS image NVIDIA ships on DGX appliances: DGX OS 7, a customized Ubuntu 24.04 on kernel 6.8 with the GPU driver branches, CUDA, Fabric Manager, DCGM, nvidia-persistenced, and DOCA-OFED preintegrated and version-locked together.⁴ The driver underneath is the datacenter driver; the point is that NVIDIA owns the whole pinned stack. See NVIDIA DGX and HGX Systems.

The single operational consequence: the GPU class picks the driver family, and the driver family, together with the silicon, fixes which features are even available. You do not choose "GeForce with ECC and MIG"; that configuration does not exist.

Why it's needed (and when)¶

Picking the wrong tier for a workload is a class of failure that no amount of configuration fixes. The map matters at three decision points:

Procurement / capacity planning. Whether a node can do hard multi-tenant isolation (MIG/vGPU), survive long runs without silent corruption (ECC), or hold sensitive weights in use (Confidential Computing) is decided by the GPU class before purchase. A GeForce box can never be made multi-tenant-safe; an RTX PRO 6000 can partition but cannot join an NVLink domain (RTX Consumer and Workstation GPUs, Vendor Sourcing and Procurement Logistics).
Compliance. Deploying GeForce in a datacenter is a licence violation (§2.8), independent of whether the workload runs. The fix is a tier change (datacenter or RTX PRO part), not a config flag.²
Install path. The driver family dictates the package, the kernel-module flavour, and whether Fabric Manager exists on the node at all. Blackwell-class parts (datacenter and RTX PRO) additionally mandate the open kernel modules. Proprietary modules are unsupported on Blackwell (Driver Install and Lifecycle).

When it bites: a single-GPU dev box on GeForce is correct and cheap; the same card racked for shared inference is wrong on cooling, ECC, isolation, and licence simultaneously. Match the tier to the job up front.

How it's managed¶

Management diverges by family: the same nvidia-smi surface, different install and lifecycle owner.

Datacenter: install from the CUDA network repo or runfile, pick one branch, hold it fleet-wide, roll behind cordon/drain. On NVSwitch boxes, Fabric Manager is lockstep-versioned with the driver and reinstalled in the same step. Full path: Driver Install and Lifecycle; branch policy: Driver Versions and Branches; rolling upgrade: Rolling Driver / CUDA Upgrade.
GeForce: the Game Ready / Studio driver (open kernel-module build on Turing+). No Fabric Manager, no vGPU tooling, no MIG lifecycle to manage; those features are absent. Manage persistence and clocks only (Persistence Mode).
RTX Enterprise: the Production Branch installer (or apt nvidia-open on Blackwell). Note the support-lag gotcha: a brand-new Blackwell SKU may land first in the New Feature Branch before the Production Branch lists it, so check the NVIDIA Unix driver page for the exact SKU (RTX Consumer and Workstation GPUs).
DGX OS: do not ad-hoc apt install the driver. Re-imaging and updates flow through the DGX OS / Base Command Manager update path; the driver, CUDA, FM, and DCGM move together. BCM/DGX OS version coupling is a real upgrade constraint (BCM 10.x ↔ DGX OS 6, BCM 11 ↔ DGX OS 7) (NVIDIA DGX and HGX Systems, Provisioning Tooling).

Common to all: open kernel modules are the default on Turing+ (R560 onward) and mandatory on Grace Hopper and Blackwell; persistence prefers the nvidia-persistenced daemon over the deprecated nvidia-smi -pm 1 (Driver Install and Lifecycle, Persistence Mode).

Support matrix by tier¶

The decisive table: which platform feature exists on which GPU class. "—" means the feature is absent on that tier (not merely off). Numbers are max counts. Verify per-SKU against the cited pages before quoting a cell.

Feature	Datacenter SXM/HGX (A100, H100/H200, B200/B300)	RTX PRO workstation (RTX PRO 6000 Blackwell)	GeForce (RTX 50/40)	DGX (HGX baseboard)
Driver family	datacenter (Tesla), Prod/LTS branch	RTX Enterprise Production Branch	GeForce (Game Ready / Studio)	DGX OS (datacenter driver, pinned)
Datacenter-licensed	Yes	Yes	No (EULA §2.8 bars it)²	Yes
MIG	Yes — up to 7 (A100/H100/H200/B200/GB200)⁵	Yes — up to 4 (PRO 6000)⁵	—	Yes — up to 7 (same SXM silicon)⁵
vGPU	Yes (A100, H100 PCIe, L40S)⁶	Yes (PRO 6000 Server, RTX 6000 Ada)⁶	—⁶	Yes (via vGPU/MIG on the SXM parts)⁶
ECC	Yes (HBM, on by default)⁷	Yes (96 GB GDDR7 ECC)⁸	— (consumer GDDR, no ECC)⁷	Yes (HBM, on by default)⁷
Persistence mode	Yes (`nvidia-persistenced`)⁹	Yes⁹	Yes⁹	Yes (preintegrated)⁹
Confidential Computing	Hopper + Blackwell only; Blackwell adds TEE-I/O across NVLink. A100 has none¹⁰	— (not a CC part)¹⁰	—¹⁰	Yes on H100/H200/B-series baseboards¹⁰
Fabric Manager / NVLink domain	Yes (HGX/NVSwitch); not on PCIe-only cards¹¹	— (no NVSwitch, no NVLink)¹²	— (no NVLink)¹²	Yes (HGX 8-GPU + NVSwitch; NVL72 adds IMEX)¹¹

Reading the matrix:

MIG exists off the datacenter line too, but only on the RTX PRO 6000 Blackwell (max 4), and at lower counts on smaller PRO parts (RTX PRO 5000 = 2). GeForce, RTX 6000 Ada, and L40S have no MIG.⁵ Detail: MIG.
Confidential Computing is a Hopper-or-newer datacenter feature. Ampere (A100) has none; consumer parts have none; Blackwell extends the TEE across the NVLink fabric via TEE-I/O.¹⁰ Treat it as datacenter-SXM-only and verify the deployment mode (Security, Isolation and Multi-tenancy).
Fabric Manager only exists where there is an NVSwitch: HGX/DGX 8-GPU baseboards and NVL72 racks. PCIe-attached datacenter cards, RTX PRO, and GeForce never run nv-fabricmanager; their multi-GPU path is PCIe peer-to-peer (Fabric Manager, RTX Consumer and Workstation GPUs).
ECC and persistence are the two features that span the most tiers: ECC everywhere except plain GeForce; persistence on every tier (it is a driver property, not silicon) (ECC, Persistence Mode).

Failure modes¶

Brief; each links its handling path or sibling.

GeForce racked for datacenter use: licence violation (EULA §2.8), plus no ECC, no MIG, no GPUDirect RDMA, and open-air cooling unfit for dense racking. No config fixes any of these; change the tier (RTX Consumer and Workstation GPUs, Security, Isolation and Multi-tenancy).²
Planning multi-tenant isolation on a tier without MIG/vGPU: GeForce, RTX 6000 Ada, and L40S cannot do hard isolation; MPS/time-slicing offer no fault isolation. Right-size to a MIG-capable part (MIG).⁵
Assuming ECC where there is none: integrity controls planned on GeForce; silent bit-flips go uncorrected on long runs (ECC).⁷
Expecting NVLink / Fabric Manager on a PCIe or RTX PRO part: there is no NVSwitch, so nv-fabricmanager is irrelevant and NCCL falls back to PCIe/host; tensor-parallel layouts become communication-bound (Fabric Manager, RTX Consumer and Workstation GPUs).¹²
Claiming Confidential Computing on A100 or consumer: neither has a TEE; CC requires Hopper or newer (Security, Isolation and Multi-tenancy).¹⁰
Wrong driver family pinned across a mixed fleet: a GeForce driver expectation on a node that needs the datacenter or RTX Enterprise branch (or the reverse). Pin the correct family per GPU class; do not mix (Driver Versions and Branches).
Ad-hoc apt install on a DGX: bypassing the DGX OS / BCM update path drifts the node off the validated, version-coupled baseline (NVIDIA DGX and HGX Systems).⁴
Proprietary modules on Blackwell: unsupported; Blackwell (datacenter and RTX PRO) mandates the open kernel modules. Install nvidia-open (Driver Install and Lifecycle).

References¶

NVIDIA Driver License Agreement — GeForce/Titan §2.8 ("not licensed for datacenter deployment"): https://www.nvidia.com/en-us/drivers/geforce-license/
NVIDIA Data Center Driver Lifecycle Policy (Production/LTS branches, 1-yr/3-yr windows): https://docs.nvidia.com/datacenter/tesla/drivers/driver-lifecycle.html
NVIDIA RTX Enterprise Software / Production Branch (Quadro ODE rebrand, ISV certification, long lifecycle): https://www.nvidia.com/en-us/software/rtx-enterprise-software/
NVIDIA RTX driver branch history (RTX / Quadro Enterprise Production Branch): https://www.nvidia.com/en-us/drivers/rtx-enterprise-and-quadro-driver-branch-history/
NVIDIA DGX OS 7 user guide (Ubuntu 24.04, kernel 6.8, driver/CUDA/DOCA-OFED preintegrated): https://docs.nvidia.com/dgx/dgx-os-7-user-guide/index.html
NVIDIA Multi-Instance GPU — Supported GPUs (A100/H100/H200/B200/GB200 max 7; RTX PRO 6000 Blackwell max 4; RTX PRO 5000 max 2): https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html
NVIDIA vGPU — GPUs Supported by vGPU (A100, H100 PCIe, L40S, RTX 6000 Ada, RTX PRO 6000 Blackwell Server; GeForce excluded): https://docs.nvidia.com/vgpu/gpus-supported-by-vgpu.html
NVIDIA GPU Memory Error Management (ECC on datacenter HBM): https://docs.nvidia.com/deploy/a100-gpu-mem-error-mgmt/index.html
NVIDIA RTX PRO 6000 Blackwell ("96 GB GDDR7 with error-correction code (ECC)"): https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/
NVIDIA Driver Persistence (nvidia-persistenced daemon; legacy -pm deprecation): https://docs.nvidia.com/deploy/driver-persistence/index.html
NVIDIA Confidential Computing (Hopper/Blackwell TEE; Blackwell TEE-I/O across NVLink): https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/
NVIDIA Fabric Manager User Guide (NVSwitch systems HGX/DGX A100/H100/H200/B200/B300; driver minimums 450/525/570; not on PCIe-only): https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html

NVIDIA Data Center Driver Lifecycle Policy — datacenter (Tesla) driver shipped as numbered Production / Long Term Support branches with 1-yr / 3-yr support windows, licensed for datacenter use. https://docs.nvidia.com/datacenter/tesla/drivers/driver-lifecycle.html ↩
NVIDIA Driver License Agreement §2.8 — "You agree that GeForce or Titan SOFTWARE: (i) is licensed for use only on GeForce or Titan hardware products you own, and (ii) is not licensed for datacenter deployment." (NVIDIA has historically published a narrow blockchain-processing exception; the live licence text fetched here states only the bar above — verify the current exception language before relying on it.) https://www.nvidia.com/en-us/drivers/geforce-license/ ↩↩↩↩
NVIDIA RTX Enterprise Software — the RTX Enterprise Production Branch is a rebrand of the Quadro Optimal Driver for Enterprise (ODE), offering the same ISV certification, long life-cycle support, and regular security updates. https://www.nvidia.com/en-us/software/rtx-enterprise-software/ ↩
NVIDIA DGX OS 7 user guide — customized Ubuntu 24.04 on Linux kernel 6.8 with NVIDIA GPU driver branches, CUDA toolkit, Fabric Manager, DCGM, nvidia-persistenced, and DOCA-OFED preintegrated and version-locked; updates flow through the DGX OS / Base Command Manager path. https://docs.nvidia.com/dgx/dgx-os-7-user-guide/index.html ↩↩
NVIDIA Multi-Instance GPU User Guide, Supported GPUs — max instances: A100-SXM4/PCIE 80GB = 7, H100-SXM5/PCIE (80/94GB) = 7, H100 on GH200 = 7, H200-SXM5/NVL 141GB = 7, B200 180GB = 7, GB200 186GB = 7, A30 24GB = 4, RTX PRO 6000 Blackwell (Server/Workstation) 96GB = 4, RTX PRO 5000 Blackwell 48GB = 2, RTX PRO 4500 Blackwell 32GB = 2. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-gpus.html ↩↩↩↩↩
NVIDIA vGPU — GPUs Supported by vGPU — supported parts include NVIDIA A100, H100 PCIe, L40S, RTX 6000 Ada, and RTX PRO 6000 Blackwell Server Edition; consumer GeForce GPUs are not on the supported list. https://docs.nvidia.com/vgpu/gpus-supported-by-vgpu.html ↩↩↩↩
NVIDIA GPU Memory Error Management — ECC on datacenter HBM (frame buffer + on-die SRAM), on by default; consumer GeForce GDDR does not expose an ECC toggle. https://docs.nvidia.com/deploy/a100-gpu-mem-error-mgmt/index.html ↩↩↩↩
NVIDIA RTX PRO 6000 Blackwell Workstation Edition — "GPU Memory: 96GB GDDR7 with error-correction code (ECC)". https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/ ↩
NVIDIA Driver Persistence — persistence mode is a user-settable driver property (not silicon-specific), available on every tier; prefer the nvidia-persistenced daemon over the deprecated nvidia-smi -pm 1. https://docs.nvidia.com/deploy/driver-persistence/index.html ↩↩↩↩
NVIDIA Confidential Computing — supported on Hopper and Blackwell (and newer) GPUs; the H100 was the first confidential-computing GPU and Blackwell extends the trusted execution environment across NVLink via TEE-I/O. Ampere (A100) and consumer GPUs have no TEE. https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/ ↩↩↩↩↩↩
NVIDIA Fabric Manager User Guide — required on multi-GPU NVSwitch systems (DGX/HGX A100, H100, H200, B200/B300, NVL72); minimum driver 450.xx (HGX A100), 525.xx (HGX H100), 570.xx (HGX B200/B300/B100); not required on single-GPU or PCIe-only (no NVLink/NVSwitch) systems. https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html ↩↩
RTX PRO and GeForce parts carry no NVSwitch and no NVLink connector (Ada removed NVLink from GeForce; RTX PRO 6000 Blackwell has none), so nv-fabricmanager is never run there — multi-GPU is PCIe peer-to-peer. https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/ ↩↩↩