NVIDIA Ampere platform (A100 / A30 / A40 / A10)¶
Scope: the Ampere datacenter generation (GA100/GA10x dies, TSMC 7nm / Samsung 8nm, 2020): the A100, A30, A40, and A10, their interconnect and MIG behaviour, and the operational, driver, support-lifecycle, and networking differences versus the newer Hopper platform and Blackwell platform. Ampere is the oldest generation in this knowledge base; treat driver-branch and support-lifecycle notes as the load-bearing operational facts.
Figures verified against NVIDIA primary sources as of June 2026. Ampere parts differ sharply by die: A100/A30 are compute (HBM, MIG); A40/A10 are render/inference (GDDR6, no MIG). Confirm the exact board against its datasheet.
What it is¶
Ampere is the generation that introduced TF32, third-generation Tensor Cores, and Multi-Instance GPU (MIG) to NVIDIA datacenter parts, with third-generation NVLink at 600 GB/s. It does not support FP8 (that arrives with Hopper) and has no Confidential Computing. The line splits cleanly: the A100 (GA100, HBM2e, MIG, NVLink) is the training flagship; A30 is its smaller MIG-capable sibling; A40 and A10 (GA10x, GDDR6, no MIG, no NVLink) target rendering, virtual workstations, and inference. A800 is the reduced-NVLink China-export A100.
flowchart TB
AMP["Ampere (2020) — TF32, 3rd-gen Tensor, MIG, no FP8, no Conf. Computing"]
AMP --> GA100["GA100 die: HBM2e, NVLink 3, MIG"]
AMP --> GA10x["GA10x die: GDDR6, no NVLink, no MIG"]
GA100 --> A100["A100: 40/80 GB HBM2e, MIG 7, 400 W SXM"]
GA100 --> A30["A30: 24 GB HBM2e, MIG 4"]
GA100 --> A800["A800: export-cut NVLink"]
GA10x --> A40["A40: 48 GB GDDR6, render/vWS"]
GA10x --> A10["A10: 24 GB GDDR6, inference/vWS"]
A100 --> SXM["SXM4 400 W: NVLink 3 + NVSwitch (HGX/DGX A100)"]
A100 --> PCIE["PCIe 300 W: NVLink Bridge only"]
Lineup & specifications¶
| Part | Die | Memory | Bandwidth | NVLink | MIG | Power | Role |
|---|---|---|---|---|---|---|---|
| A100 SXM4 | GA100 | 40 / 80 GB HBM2e | ~1.6 / ~2.0 TB/s | 3rd-gen, 600 GB/s | 7 instances | 400 W | Training / HPC flagship |
| A100 PCIe | GA100 | 40 / 80 GB HBM2e | ~1.6 / ~1.94 TB/s | NVLink Bridge (2-way) | 7 instances | 300 W | Training, no NVSwitch |
| A30 | GA100 | 24 GB HBM2e | ~933 GB/s | NVLink Bridge | 4 instances | 165 W | Mainstream compute / inference |
| A40 | GA102 | 48 GB GDDR6 (ECC) | ~696 GB/s | NVLink Bridge (2-way) | none | 300 W | Render / virtual workstation |
| A10 | GA102 | 24 GB GDDR6 | ~600 GB/s | none | none | 150 W | Inference / VDI / vWS |
| A800 | GA100 | 40 / 80 GB HBM2e | as A100 | reduced (e.g. 400 GB/s) | 7 | 300-400 W | China-export A100 |
Compute: third-generation Tensor Cores supporting FP16/BF16/TF32/INT8/INT4 with FP64 acceleration and structural sparsity; no FP8, no FP4. PCIe Gen4. Specs cited from the A100/A30/A40/A10 product pages, the A100 (Tensor Core GPU) datasheet, and the Ampere architecture whitepaper (see References).
Operational differences¶
- Driver branch & support lifecycle. Datacenter (Tesla) driver only. Ampere is the oldest generation here: it is served by the datacenter LTS branches, and its support window is shorter than Hopper/Blackwell. Track the production/LTS branch end-of-life and plan migrations accordingly; a new datacenter LTS may drop or de-prioritise Ampere before it drops Hopper. Confirm against the driver documentation and the GPU support matrix. See GPU software stack and driver upgrade runbook.
- NVLink / NVSwitch. Third-gen NVLink at 600 GB/s on A100. SXM4 A100 joins an NVSwitch domain on HGX/DGX A100 (8-GPU baseboard, 6 NVSwitches on DGX A100); PCIe A100/A30/A40 expose only a 2-way NVLink Bridge and have no NVSwitch domain. A10 has no NVLink at all (PCIe P2P only). Multi-GPU without NVSwitch falls back to PCIe/host (NCCL over PCIe).
- Fabric Manager applies only to NVSwitch systems (HGX/DGX A100); not used on PCIe-only or A40/A10 nodes.
- MIG, first generation. A100 supports up to 7 instances, A30 up to 4. A40 and A10 do not support MIG (GA10x render dies). This is the key partitioning split inside the generation. See security & multi-tenancy and Kubernetes for GPUs.
- No Confidential Computing. Ampere has no GPU TEE; there is no encrypted/attested in-use protection at the GPU. That capability begins with Hopper. For confidential workloads, Ampere is not an option.
- GPUDirect RDMA and vGPU are supported (datacenter parts); A40/A10/A30 are common vGPU/virtual-workstation cards. ECC is on by default (HBM2e on A100/A30; GDDR6 with ECC on A40/A10).
- No FP8. Mixed-precision training/inference on Ampere uses BF16/FP16/TF32; code paths assuming the Transformer Engine or FP8 will not run on Ampere.
Install & setup¶
Reference template only, unexecuted and not hardware-tested; pin versions against the driver install guide, the driver and MIG matrices, and the DOCA-OFED guide. Ampere is the oldest generation here: it is still carried by current datacenter branches (as of June 2026, R570 and R580 validate Ampere) but is first to lose new-driver attention; pin to a still-supported branch and verify its EOL before a long-lived fleet. Datacenter (Tesla) driver only, never GeForce. Hold the driver, Fabric Manager, and libnvidia-nscq to the same version. Ordered checklist:
# 0. Confirm the board and kernel before touching drivers
nvidia-smi -L 2>/dev/null || echo "no driver yet" # A100/A30 (HBM2e) vs A40/A10 (GDDR6) differ sharply
uname -r # kernel must be in the driver/OFED matrix
# 1. Add NVIDIA's CUDA network repo (cuda-keyring). Replace $distro, e.g. ubuntu2204.
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
# 2. Datacenter driver — pin the branch. Open kernel modules (recommended, Turing+):
sudo apt-get install -y nvidia-open-570 # or nvidia-open-580; proprietary: cuda-drivers-570
# DKMS note: open modules ship via nvidia-dkms-<branch>-open and rebuild on kernel bumps;
# a kernel update can silently fail to rebuild the module — re-run `sudo dkms autoinstall`
# and reboot, then re-check `nvidia-smi` before scheduling work.
# 3. CUDA toolkit (match to the driver's CUDA support)
sudo apt-get install -y cuda-toolkit-12-9 # or `cuda-toolkit` for the repo default
# 4. Fabric Manager + NSCQ — HGX/DGX A100 (NVSwitch) ONLY; version == driver branch.
# A40 / A10 / A30 / PCIe A100 have NO NVSwitch -> do NOT install or enable Fabric Manager.
sudo apt-get install -y nvidia-fabricmanager-570 libnvidia-nscq-570 # branch must match step 2
# 5. ConnectX-6 (HDR) host stack: DOCA-OFED is the 1-to-1 MLNX_OFED replacement.
# Download the doca-host repo .deb matching your distro from NVIDIA, then:
sudo dpkg -i doca-host_<ver>-<distro>_amd64.deb # verify <ver> against DOCA release notes
sudo apt-get update
sudo apt-get install -y doca-ofed # full OFED stack (drivers + libs + tools)
sudo /etc/init.d/openibd restart # reload mlx5; on off-matrix kernels reinstall with --add-kernel-support
# 6. Persistence daemon — recommended on dedicated compute nodes (avoids cold-init latency)
sudo systemctl enable --now nvidia-persistenced
# 7. Bring up + verify (NVSwitch nodes only)
sudo systemctl enable --now nvidia-fabricmanager # HGX/DGX A100 ONLY; FM/driver mismatch -> service fails
systemctl status nvidia-fabricmanager
nvidia-smi nvlink --status # expect NVLink 3 links on A100 SXM; none on A10
ibstat # ConnectX-6 ports: expect Active, HDR
# 8. MIG on A100 (up to 7) — A30 up to 4; A40/A10 do NOT support MIG
sudo nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -i 0 -cgi 19,19,19,19,19,19,19 -C # 7x 1g.5gb on 40GB; profile IDs vary by SKU
# Verify the running driver is within a still-supported branch before fleet rollout
nvidia-smi --query-gpu=driver_version --format=csv,noheader
On A40 / A10 / A30 / PCIe A100 nodes there is no NVSwitch: skip step 4 and the Fabric Manager line in step 7 entirely. See Ansible bring-up for fleet automation, GPU software stack for the version matrix, and datacentre readiness for power/airflow (Ampere is air-cooled at 150-400 W, far easier than Blackwell).
When to use it¶
- A100: still-capable training/HPC and FP64 work; chosen today mainly for cost, existing fleets, or FP64 needs rather than peak throughput. No FP8.
- A30: mainstream MIG-partitioned inference where A100 capacity is overkill.
- A40: professional rendering, virtual workstations, and graphics-plus-compute (no MIG).
- A10: dense inference and VDI; low power (150 W), no NVLink, no MIG.
- A800: legacy China-export A100 deployments; reduced NVLink limits collective bandwidth.
- Prefer Hopper or Blackwell for new builds needing FP8/FP4, Confidential Computing, or higher NVLink bandwidth; keep Ampere where the workload fits its precision/feature envelope and the support lifecycle is acceptable.
Networking¶
Ampere is the HDR InfiniBand era. Generation-specific fabric:
- NIC: ConnectX-6, single/dual-port 200 Gb/s HDR InfiniBand (also 200GbE), PCIe Gen4, with GPUDirect RDMA for direct NIC-to-HBM DMA. HDR100 (100 Gb/s) splits are common where the switch port is shared.
- Switch: Quantum HDR (QM8700/QM8790), 40x 200 Gb/s HDR ports on QSFP56 (or 80x HDR100 via splitters), 16 Tb/s aggregate. RoCE Ethernet (Spectrum) is the alternative.
- Per-port rate: 200 Gb/s line rate per ConnectX-6 HDR port; one step below Hopper's 400 Gb/s NDR (ConnectX-7) and Blackwell's XDR (ConnectX-8). Size inter-node collectives accordingly.
- NVLink: third-gen NVLink 3 at 600 GB/s/GPU. On SXM4 A100 the NVLink/NVSwitch domain (8-GPU HGX baseboard, 6 NVSwitches on DGX A100) carries intra-node all-reduce. PCIe A100/A30/A40 expose only a 2-way NVLink Bridge; A10 has no NVLink at all (PCIe P2P only).
- Fabric Manager applicability: required ONLY on NVSwitch systems (HGX/DGX A100). A40 / A10 / A30 / PCIe A100 have no NVSwitch and therefore no Fabric Manager.
For the actual bring-up, validation, and benchmark commands (NVLink/NVSwitch checks, ibstat/ibping, nvbandwidth, nccl-tests all-reduce bus-bandwidth), use the shared keystone runbook (do not duplicate it here): Fabric bring-up, validation & benchmarking.
Ampere-specific pointers:
- A40/A10 have no NVLink/NVSwitch; they rely on PCIe P2P + the host NIC. Validate NCCL PCIe P2P fallback explicitly and confirm GPUDirect RDMA over ConnectX-6; do not expect NVLink bandwidth. (A10 has no NVLink Bridge at all.)
- RoCE on Spectrum Ethernet. If the fabric is RoCE rather than InfiniBand, lossless transport requires DCBX/PFC (and ideally ECN) configured end-to-end on the Spectrum switch and NIC; misconfigured PFC silently tanks collective throughput. Validate with
nccl-testsbefore production. - A30 MIG + single NIC. A30 partitions into 4 MIG slices but has no NVSwitch; inter-node still goes via the single ConnectX-6 HDR port. Rail design matters less, NIC saturation matters more.
- Verify FM before collectives on A100 SXM. On HGX/DGX A100
nvidia-fabricmanagermust be active or NCCL degrades to PCIe;nvidia-smi nvlink --statusshould show NVLink 3 links up.
See networking fabric.
Gotchas & failure modes¶
- A40/A10 are not A100. They are GDDR6 render/inference dies with no MIG and (A10) no NVLink; do not plan MIG slicing or NVSwitch collectives on them.
- PCIe vs SXM A100. Only SXM4 A100 on an HGX/DGX baseboard gives the NVSwitch 600 GB/s domain; PCIe A100 has a 2-way bridge only.
- No FP8 / no Confidential Computing. Workloads requiring the Transformer Engine, FP8, or a GPU TEE cannot run on Ampere; this is the hard ceiling versus Hopper.
- Support lifecycle. As the oldest generation, Ampere is first to lose new-driver attention. Pin to a still-supported datacenter LTS branch and verify EOL before committing to a long-lived fleet (see driver upgrade runbook).
- Fabric Manager misuse. Enabling Fabric Manager on a non-NVSwitch node (A40/A10/PCIe A100) is wrong; it belongs only on HGX/DGX A100.
- A800 export cut. Reduced NVLink bandwidth versus A100; do not assume full A100 collective throughput.
References¶
- NVIDIA A100 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/a100/
- NVIDIA A30 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/products/a30-gpu/
- NVIDIA A40 GPU (product page): https://www.nvidia.com/en-us/data-center/a40/
- NVIDIA A10 GPU (product page): https://www.nvidia.com/en-us/data-center/products/a10-gpu/
- NVIDIA Ampere architecture (technology overview): https://www.nvidia.com/en-us/technologies/ampere-architecture/
- NVIDIA Ampere architecture whitepaper: https://resources.nvidia.com/en-us-gpu-resources/nvidia-ampere-architecture-whitepaper
- NVIDIA Ampere architecture whitepaper (PDF): https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
- NVIDIA Tensor Core GPU datasheet (A100): https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet
- NVIDIA MIG user guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
- NVIDIA datacenter driver documentation: https://docs.nvidia.com/datacenter/tesla/drivers/index.html
- NVIDIA datacenter driver installation guide (Ubuntu): https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
- NVIDIA CUDA installation guide for Linux: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
- NVIDIA Fabric Manager user guide: https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
- NVIDIA Fabric Manager apt packaging (package names, version-match requirement): https://github.com/NVIDIA/apt-packaging-fabric-manager
- NVIDIA driver persistence / nvidia-persistenced: https://docs.nvidia.com/deploy/driver-persistence/index.html
- NVIDIA MLNX_OFED to DOCA-OFED transition guide: https://docs.nvidia.com/doca/sdk/mlnx_ofed-to-doca-ofed-transition-guide/index.html
- NVIDIA ConnectX-6 InfiniBand adapter (product page): https://www.nvidia.com/en-us/networking/infiniband-adapters/
- NVIDIA Quantum HDR InfiniBand switches (QM8700): https://www.nvidia.com/en-us/networking/infiniband-switching/
Related: GPU generations · Hopper platform · Blackwell platform · GPU software stack · Networking fabric · Security & multi-tenancy · Kubernetes for GPUs · Ansible bring-up · Glossary