NVIDIA Hopper platform (H100 / H200 / GH200)¶
Scope: the Hopper datacenter generation (GH100 die, TSMC 4N, 2022-2024): the H100 and H200 GPUs, the GH200 Grace Hopper superchip, their HGX/DGX systems, and the operational, driver, and networking differences versus the older Ampere platform and the newer Blackwell platform. Re-check fast-moving specs on the linked NVIDIA datasheets before relying on a single number.
Figures verified against NVIDIA primary sources as of June 2026. SXM and PCIe variants differ materially in power and interconnect; confirm the exact board SKU against its datasheet.
What it is¶
Hopper is the architecture that introduced the Transformer Engine and FP8 tensor math to NVIDIA datacenter GPUs, alongside fourth-generation NVLink at 900 GB/s and on-GPU Confidential Computing. All Hopper datacenter parts share the GH100 die. The line spans three things that are easy to conflate: the H100 GPU (the workhorse), the H200 (same compute, much more and faster memory), and the GH200 Grace Hopper superchip (a Grace Arm CPU coherently fused to a Hopper GPU). H800 and H20 are reduced-interconnect China-export variants.
flowchart TB
GH100["GH100 die (TSMC 4N) — FP8 Transformer Engine, NVLink 4"]
GH100 --> H100["H100: 80 GB HBM3"]
GH100 --> H200["H200: 141 GB HBM3e, 4.8 TB/s"]
GH100 --> GH200["GH200: Grace CPU + Hopper GPU, NVLink-C2C"]
GH100 --> EXPORT["H800 / H20: export-cut NVLink"]
H100 --> SXM["SXM5 700 W: full NVLink, NVSwitch domain"]
H100 --> PCIE["PCIe 350 W: NVLink Bridge only, no NVSwitch"]
SXM --> HGX["HGX / DGX H100/H200: 8 GPU + 4 NVSwitch"]
Lineup & specifications¶
| Part | Memory | Bandwidth | NVLink | Form factor / power | MIG | Notes |
|---|---|---|---|---|---|---|
| H100 SXM5 | 80 GB HBM3 | ~3.35 TB/s | 4th-gen, 900 GB/s | SXM5, 700 W | 7 instances | Full NVLink, joins NVSwitch domain |
| H100 PCIe | 80 GB HBM3 | ~2 TB/s | NVLink Bridge (2-way) only | PCIe Gen5, 350 W | 7 instances | No NVSwitch; lower clocks/TDP |
| H100 NVL | 2x 94 GB HBM3 | ~3.9 TB/s/GPU | Bridged pair | PCIe, ~400 W/GPU | 7 instances | LLM-inference paired SKU |
| H200 SXM | 141 GB HBM3e | 4.8 TB/s | 4th-gen, 900 GB/s | SXM, ~700 W | 7 instances | Same GH100 compute as H100 |
| GH200 | 96/144 GB HBM3 (GPU) + up to 480 GB LPDDR5X (Grace) | HBM ~4 TB/s | NVLink-C2C 900 GB/s CPU↔GPU | superchip module | 7 instances | Coherent unified CPU+GPU memory |
| H800 / H20 | as base | as base | reduced interconnect | SXM/PCIe | yes | China-export compliance SKUs |
Compute: fourth-generation Tensor Cores with the Transformer Engine, adding FP8 (E4M3/E5M2) on top of FP16/BF16/TF32/INT8 and accelerated FP64; DPX instructions for dynamic-programming workloads; PCIe Gen5. Specs cited from the H100, H200, and Grace Hopper product pages and the Hopper architecture whitepaper (see References).
Operational differences¶
The differences that change how a cluster is built and run:
- Driver branch. Datacenter (Tesla/
nvidia-openor proprietary) driver, never the GeForce driver. Hopper is served by the datacenter production/LTS branches; pair the driver with a matching Fabric Manager and CUDA toolkit. See GPU software stack and driver upgrade runbook. - NVLink / NVSwitch. Fourth-gen NVLink at 900 GB/s per GPU. SXM5 H100/H200 expose full NVLink and join an NVSwitch domain (8-GPU HGX baseboard with 4 NVSwitches); PCIe H100 has only an optional 2-way NVLink Bridge and no NVSwitch domain. Multi-GPU beyond a bridged pair falls back to PCIe/host (NCCL routes over PCIe). This is the single largest topology difference between SXM and PCIe Hopper.
- Fabric Manager is required on NVSwitch systems (HGX/DGX H100/H200) to configure and heal the NVLink fabric; it is not used on PCIe-only nodes.
- MIG. Second-generation Multi-Instance GPU, up to 7 instances, with improved per-instance isolation, monitoring, and (Hopper-new) MIG-level confidential compute. See security & multi-tenancy and Kubernetes for GPUs.
- Confidential Computing, new in Hopper. Hopper is the first NVIDIA GPU with a hardware TEE: an encrypted, attested CVM path protecting code and data in use. Ampere has none. Blackwell extends the TEE across NVLink (TEE-I/O); Hopper's TEE is per-GPU over the PCIe/CPU boundary. Verify the deployment mode against the NVIDIA Confidential Computing docs.
- GPUDirect RDMA and vGPU are supported (datacenter parts), enabling direct NIC-to-GPU DMA for the fabric described in networking fabric.
- ECC on HBM is on by default.
- Transformer Engine / FP8 is the headline software-visible capability: it dynamically casts layers to FP8 with per-tensor scaling, roughly doubling effective tensor throughput versus BF16 for transformers. FP8 needs Hopper or newer (and Ada); it is absent on Ampere.
Install & setup¶
Reference template only (unexecuted, not hardware-tested); pin exact versions against the driver install guide, the driver and Fabric Manager matrices, and the DOCA-OFED guide for your CUDA target. Hopper is served by the datacenter production branches. As of June 2026 R570 (validated on HGX H100/H200) and the newer R580 are the live production branches; never the GeForce driver. Pick one branch and hold the driver, Fabric Manager, and libnvidia-nscq to the same version; version skew is the most frequent Hopper bring-up failure. Ordered checklist for an HGX/DGX H100/H200 node:
# 0. Confirm the board and kernel before touching drivers
nvidia-smi -L 2>/dev/null || echo "no driver yet" # H100/H200 SXM expected on HGX/DGX
uname -r # kernel must be in the driver/OFED matrix
# 1. Add NVIDIA's CUDA network repo (cuda-keyring). Replace $distro, e.g. ubuntu2204.
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
# 2. Datacenter driver — pin the branch. Open kernel modules (recommended, Turing+):
sudo apt-get install -y nvidia-open-570 # or nvidia-open-580; proprietary: cuda-drivers-570
# DKMS note: open modules ship via nvidia-dkms-<branch>-open and rebuild on kernel bumps;
# a kernel update can silently fail to rebuild the module — re-run `sudo dkms autoinstall`
# and reboot, then re-check `nvidia-smi` before scheduling work.
# 3. CUDA toolkit (match to the driver's CUDA support; toolkit can trail the driver)
sudo apt-get install -y cuda-toolkit-12-9 # or `cuda-toolkit` for the repo default
# 4. Fabric Manager + NSCQ — REQUIRED on NVSwitch (HGX/DGX H100/H200), version == driver
sudo apt-get install -y nvidia-fabricmanager-570 libnvidia-nscq-570 # branch must match step 2
# 5. ConnectX-7 (NDR) host stack: DOCA-OFED is the 1-to-1 MLNX_OFED replacement.
# Download the doca-host repo .deb matching your distro from NVIDIA, then:
sudo dpkg -i doca-host_<ver>-<distro>_amd64.deb # verify <ver> against DOCA release notes
sudo apt-get update
sudo apt-get install -y doca-ofed # full OFED stack (drivers + libs + tools)
sudo /etc/init.d/openibd restart # reload mlx5; on off-matrix kernels reinstall with --add-kernel-support
# 6. Persistence daemon — recommended on dedicated compute nodes (avoids cold-init latency)
sudo systemctl enable --now nvidia-persistenced
# 7. Bring up + verify the NVLink fabric (NVSwitch nodes only)
sudo systemctl enable --now nvidia-fabricmanager # must be active BEFORE NCCL multi-GPU
systemctl status nvidia-fabricmanager # FM/driver mismatch -> service fails, NVLink down
nvidia-smi nvlink --status # expect NVLink 4 links up on SXM
ibstat # ConnectX-7 ports: expect Active, NDR
# 8. MIG (optional): enable, then create 7x instances on H100/H200
sudo nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -i 0 -cgi 19,19,19,19,19,19,19 -C # profile IDs are SKU-specific
On PCIe-only Hopper nodes (H100 PCIe, H100 NVL pair) there is no NVSwitch. Skip steps 4 and 7's Fabric Manager: do not install or enable nvidia-fabricmanager. Multi-GPU beyond a bridged pair routes over PCIe/host. See Ansible bring-up for fleet automation and GPU software stack for the full version matrix.
When to use it¶
- H100: the broad training/inference baseline; mature software, wide availability, FP8 for transformer training and serving.
- H200: memory-bound inference and long-context/large-model serving where the jump to 141 GB HBM3e and 4.8 TB/s matters more than raw FLOPS (compute equals H100).
- GH200: workloads that benefit from coherent CPU+GPU memory and high CPU-GPU bandwidth (large embeddings, graph, data-prep-adjacent, KV-cache offload).
- Prefer SXM for any multi-GPU collective-heavy job (NVLink/NVSwitch); PCIe suits single-GPU, bridged-pair, or power/airflow-constrained nodes.
- Choose Hopper over Blackwell when air-cooled, lower-power (700 W vs 1000-1400 W) infrastructure is the constraint, or when FP4/NVFP4 and TEE-I/O are not needed.
Networking¶
Hopper is the NDR InfiniBand era. Generation-specific fabric:
- NIC: ConnectX-7, single/dual-port 400 Gb/s NDR InfiniBand (also 400GbE), PCIe Gen5, with In-Network Computing and GPUDirect RDMA giving the NIC a direct DMA path to GPU HBM.
- Switch: Quantum-2 (QM9700/QM9750), 64x 400 Gb/s NDR ports (or 128x 200 Gb/s NDR200) on 32 OSFP cages, 51.2 Tb/s aggregate. Rail-aligned, one ConnectX-7 port per GPU rail. Spectrum-X (Spectrum-4 + BlueField-3) is the Ethernet alternative.
- Per-port rate: 400 Gb/s line rate per ConnectX-7 NDR port; SHARP in-network reduction offloads collectives.
- NVLink: fourth-gen NVLink 4 at 900 GB/s/GPU. On SXM5 H100/H200 the NVLink/NVSwitch domain (8-GPU HGX baseboard, 4 NVSwitches) carries intra-node all-reduce; the InfiniBand rail carries inter-node. PCIe H100 has only an optional 2-way NVLink Bridge and no NVSwitch domain.
- Fabric Manager applicability: required on NVSwitch systems (HGX/DGX H100/H200); H200 reuses the HGX H100 8-GPU baseboard. PCIe H100 / H100 NVL: no Fabric Manager.
For the actual bring-up, validation, and benchmark commands (NVLink/NVSwitch checks, ibstat/ibping, nvbandwidth, nccl-tests all-reduce bus-bandwidth), use the shared keystone runbook; do not duplicate it here: Fabric bring-up, validation & benchmarking.
Hopper-specific pointers:
- Verify FM before collectives. On HGX/DGX nodes
nvidia-fabricmanagermust be active or NCCL silently degrades to PCIe and all-reduce bus-bandwidth collapses; checknvidia-smi nvlink --statusshows NVLink 4 links up. - NDR cabling/transceivers. Quantum-2 uses OSFP at 400 Gb/s; a single twin-port OSFP can split to 2x NDR200. Confirm transceiver/DAC and OSFP vs QSFP112 before claiming a port is at NDR rate;
ibstatwidth/speed is the source of truth. - Rail alignment. Pin one ConnectX-7 port per GPU rail and set
NCCL_IB_HCAaccordingly; mis-rail-aligned topologies cost inter-node all-reduce bandwidth even at full NDR. - PCIe Hopper NCCL. On PCIe H100 nodes test NCCL PCIe P2P fallback explicitly (no NVSwitch); do not size collectives at 900 GB/s.
This NDR/ConnectX-7 pairing is the main networking-era difference from Ampere (HDR/ConnectX-6) and Blackwell (XDR/ConnectX-8). See networking fabric.
Gotchas & failure modes¶
- PCIe vs SXM confusion. PCIe H100 is 350 W with no NVSwitch domain; expecting 900 GB/s all-reduce on PCIe nodes is a common, costly mistake. Confirm the form factor before sizing collectives.
- Fabric Manager not running on an NVSwitch node leaves NVLink unconfigured; NCCL silently degrades to PCIe and throughput collapses. Always verify
nvidia-fabricmanageris active first (see NCCL hang runbook). - Driver/Fabric-Manager/CUDA version skew is the most frequent Hopper bring-up failure; all three must come from a compatible set.
- Export SKUs (H800/H20) have reduced NVLink/interconnect; do not assume full-H100 collective bandwidth on these parts.
- FP8 accuracy. The Transformer Engine helps, but FP8 still needs scaling/calibration care; validate model quality, do not assume drop-in parity with BF16.
- H200 ≠ more compute. It is the same GH100 compute as H100; gains are memory-capacity and memory-bandwidth bound only.
References¶
- NVIDIA H100 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/h100/
- NVIDIA H200 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/h200/
- NVIDIA GH200 Grace Hopper Superchip (product page): https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/
- NVIDIA Hopper architecture (technology overview): https://www.nvidia.com/en-us/data-center/technologies/hopper-architecture/
- NVIDIA H100 / Hopper architecture whitepaper (GTC22): https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper
- NVIDIA Grace Hopper Superchip datasheet: https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip
- NVIDIA H200 HPC/AI datasheet: https://resources.nvidia.com/en-us-data-center-overview/hpc-datasheet-sc23-h200
- NVIDIA Confidential Computing (Hopper TEE): https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/
- NVIDIA Fabric Manager user guide: https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
- NVIDIA MIG user guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
- NVIDIA datacenter driver documentation: https://docs.nvidia.com/datacenter/tesla/drivers/index.html
- NVIDIA datacenter driver installation guide (Ubuntu): https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
- NVIDIA CUDA installation guide for Linux: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
- NVIDIA R570 datacenter driver release notes (HGX H100/H200 validation): https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-172-08/index.html
- NVIDIA R580 datacenter driver release notes: https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-65-06/index.html
- NVIDIA Fabric Manager apt packaging (package names, version-match requirement): https://github.com/NVIDIA/apt-packaging-fabric-manager
- NVIDIA driver persistence / nvidia-persistenced: https://docs.nvidia.com/deploy/driver-persistence/index.html
- NVIDIA MLNX_OFED to DOCA-OFED transition guide: https://docs.nvidia.com/doca/sdk/mlnx_ofed-to-doca-ofed-transition-guide/index.html
- NVIDIA ConnectX-7 NDR 400G InfiniBand adapter datasheet: https://www.nvidia.com/content/dam/en-zz/Solutions/networking/infiniband-adapters/infiniband-connectx7-data-sheet.pdf
- NVIDIA Quantum-2 InfiniBand platform: https://www.nvidia.com/en-us/networking/quantum2/
Related: GPU generations · Ampere platform · Blackwell platform · GPU software stack · Networking fabric · Security & multi-tenancy · Confidential Computing & Attestation · Kubernetes for GPUs · Ansible bring-up · Glossary