Markdown

A10)¶

Scope: the Ampere datacenter generation (GA100/GA10x dies, TSMC 7nm / Samsung 8nm, 2020): the A100, A30, A40, and A10, their interconnect and MIG behaviour, and the operational, driver, support-lifecycle, and networking differences versus the newer Hopper platform and Blackwell platform. Ampere is the oldest generation in this knowledge base; treat driver-branch and support-lifecycle notes as the load-bearing operational facts.

Figures verified against NVIDIA primary sources as of June 2026. Ampere parts differ sharply by die: A100/A30 are compute (HBM, MIG); A40/A10 are render/inference (GDDR6, no MIG). Confirm the exact board against its datasheet.

What it is¶

Ampere is the generation that introduced TF32, third-generation Tensor Cores, and Multi-Instance GPU (MIG) to NVIDIA datacenter parts, with third-generation NVLink at 600 GB/s. It does not support FP8 (that arrives with Hopper) and has no Confidential Computing. The line splits cleanly: the A100 (GA100, HBM2e, MIG, NVLink) is the training flagship; A30 is its smaller MIG-capable sibling; A40 and A10 (GA10x, GDDR6, no MIG, no NVLink) target rendering, virtual workstations, and inference. A800 is the reduced-NVLink China-export A100.

flowchart TB
  AMP["Ampere (2020) — TF32, 3rd-gen Tensor, MIG, no FP8, no Conf. Computing"]
  AMP --> GA100["GA100 die: HBM2e, NVLink 3, MIG"]
  AMP --> GA10x["GA10x die: GDDR6, no NVLink, no MIG"]
  GA100 --> A100["A100: 40/80 GB HBM2e, MIG 7, 400 W SXM"]
  GA100 --> A30["A30: 24 GB HBM2e, MIG 4"]
  GA100 --> A800["A800: export-cut NVLink"]
  GA10x --> A40["A40: 48 GB GDDR6, render/vWS"]
  GA10x --> A10["A10: 24 GB GDDR6, inference/vWS"]
  A100 --> SXM["SXM4 400 W: NVLink 3 + NVSwitch (HGX/DGX A100)"]
  A100 --> PCIE["PCIe 300 W: NVLink Bridge only"]

Lineup & specifications¶

Part	Die	Memory	Bandwidth	NVLink	MIG	Power	Role
A100 SXM4	GA100	40 / 80 GB HBM2e	~1.6 / ~2.0 TB/s	3rd-gen, 600 GB/s	7 instances	400 W	Training / HPC flagship
A100 PCIe	GA100	40 / 80 GB HBM2e	~1.6 / ~1.94 TB/s	NVLink Bridge (2-way)	7 instances	300 W	Training, no NVSwitch
A30	GA100	24 GB HBM2e	~933 GB/s	NVLink Bridge	4 instances	165 W	Mainstream compute / inference
A40	GA102	48 GB GDDR6 (ECC)	~696 GB/s	NVLink Bridge (2-way)	none	300 W	Render / virtual workstation
A10	GA102	24 GB GDDR6	~600 GB/s	none	none	150 W	Inference / VDI / vWS
A800	GA100	40 / 80 GB HBM2e	as A100	reduced (e.g. 400 GB/s)	7	300-400 W	China-export A100

Compute: third-generation Tensor Cores supporting FP16/BF16/TF32/INT8/INT4 with FP64 acceleration and structural sparsity; no FP8, no FP4. PCIe Gen4. Specs cited from the A100/A30/A40/A10 product pages, the A100 (Tensor Core GPU) datasheet, and the Ampere architecture whitepaper (see References).

Operational differences¶

Driver branch & support lifecycle. Datacenter (Tesla) driver only. Ampere is the oldest generation here: it is served by the datacenter LTS branches, and its support window is shorter than Hopper/Blackwell. Track the production/LTS branch end-of-life and plan migrations accordingly; a new datacenter LTS may drop or de-prioritise Ampere before it drops Hopper. Confirm against the driver documentation and the GPU support matrix. See GPU software stack and driver upgrade runbook.
NVLink / NVSwitch. Third-gen NVLink at 600 GB/s on A100. SXM4 A100 joins an NVSwitch domain on HGX/DGX A100 (8-GPU baseboard, 6 NVSwitches on DGX A100); PCIe A100/A30/A40 expose only a 2-way NVLink Bridge and have no NVSwitch domain. A10 has no NVLink at all (PCIe P2P only). Multi-GPU without NVSwitch falls back to PCIe/host (NCCL over PCIe).
Fabric Manager applies only to NVSwitch systems (HGX/DGX A100); not used on PCIe-only or A40/A10 nodes.
MIG, first generation. A100 supports up to 7 instances, A30 up to 4. A40 and A10 do not support MIG (GA10x render dies). This is the key partitioning split inside the generation. See security & multi-tenancy and Kubernetes for GPUs.
No Confidential Computing. Ampere has no GPU TEE; there is no encrypted/attested in-use protection at the GPU. That capability begins with Hopper. For confidential workloads, Ampere is not an option.
GPUDirect RDMA and vGPU are supported (datacenter parts); A40/A10/A30 are common vGPU/virtual-workstation cards. ECC is on by default (HBM2e on A100/A30; GDDR6 with ECC on A40/A10).
No FP8. Mixed-precision training/inference on Ampere uses BF16/FP16/TF32; code paths assuming the Transformer Engine or FP8 will not run on Ampere.

Install & setup¶

Reference template only, unexecuted and not hardware-tested; pin versions against the driver install guide, the driver and MIG matrices, and the DOCA-OFED guide. Ampere is the oldest generation here: it is still carried by current datacenter branches (as of June 2026, R570 and R580 validate Ampere) but is first to lose new-driver attention; pin to a still-supported branch and verify its EOL before a long-lived fleet. Datacenter (Tesla) driver only, never GeForce. Hold the driver, Fabric Manager, and libnvidia-nscq to the same version. Ordered checklist:

# 0. Confirm the board and kernel before touching drivers
nvidia-smi -L 2>/dev/null || echo "no driver yet"   # A100/A30 (HBM2e) vs A40/A10 (GDDR6) differ sharply
uname -r                                             # kernel must be in the driver/OFED matrix

# 1. Add NVIDIA's CUDA network repo (cuda-keyring). Replace $distro, e.g. ubuntu2204.
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

# 2. Datacenter driver — pin the branch. Open kernel modules (recommended, Turing+):
sudo apt-get install -y nvidia-open-570            # or nvidia-open-580; proprietary: cuda-drivers-570
#    DKMS note: open modules ship via nvidia-dkms-<branch>-open and rebuild on kernel bumps;
#    a kernel update can silently fail to rebuild the module — re-run `sudo dkms autoinstall`
#    and reboot, then re-check `nvidia-smi` before scheduling work.

# 3. CUDA toolkit (match to the driver's CUDA support)
sudo apt-get install -y cuda-toolkit-12-9          # or `cuda-toolkit` for the repo default

# 4. Fabric Manager + NSCQ — HGX/DGX A100 (NVSwitch) ONLY; version == driver branch.
#    A40 / A10 / A30 / PCIe A100 have NO NVSwitch -> do NOT install or enable Fabric Manager.
sudo apt-get install -y nvidia-fabricmanager-570 libnvidia-nscq-570   # branch must match step 2

# 5. ConnectX-6 (HDR) host stack: DOCA-OFED is the 1-to-1 MLNX_OFED replacement.
#    Download the doca-host repo .deb matching your distro from NVIDIA, then:
sudo dpkg -i doca-host_<ver>-<distro>_amd64.deb    # verify <ver> against DOCA release notes
sudo apt-get update
sudo apt-get install -y doca-ofed                  # full OFED stack (drivers + libs + tools)
sudo /etc/init.d/openibd restart                   # reload mlx5; on off-matrix kernels reinstall with --add-kernel-support

# 6. Persistence daemon — recommended on dedicated compute nodes (avoids cold-init latency)
sudo systemctl enable --now nvidia-persistenced

# 7. Bring up + verify (NVSwitch nodes only)
sudo systemctl enable --now nvidia-fabricmanager   # HGX/DGX A100 ONLY; FM/driver mismatch -> service fails
systemctl status nvidia-fabricmanager
nvidia-smi nvlink --status                         # expect NVLink 3 links on A100 SXM; none on A10
ibstat                                             # ConnectX-6 ports: expect Active, HDR

# 8. MIG on A100 (up to 7) — A30 up to 4; A40/A10 do NOT support MIG
sudo nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -i 0 -cgi 19,19,19,19,19,19,19 -C   # 7x 1g.5gb on 40GB; profile IDs vary by SKU

# Verify the running driver is within a still-supported branch before fleet rollout
nvidia-smi --query-gpu=driver_version --format=csv,noheader

On A40 / A10 / A30 / PCIe A100 nodes there is no NVSwitch: skip step 4 and the Fabric Manager line in step 7 entirely. See Ansible bring-up for fleet automation, GPU software stack for the version matrix, and datacentre readiness for power/airflow (Ampere is air-cooled at 150-400 W, far easier than Blackwell).

When to use it¶

A100: still-capable training/HPC and FP64 work; chosen today mainly for cost, existing fleets, or FP64 needs rather than peak throughput. No FP8.
A30: mainstream MIG-partitioned inference where A100 capacity is overkill.
A40: professional rendering, virtual workstations, and graphics-plus-compute (no MIG).
A10: dense inference and VDI; low power (150 W), no NVLink, no MIG.
A800: legacy China-export A100 deployments; reduced NVLink limits collective bandwidth.
Prefer Hopper or Blackwell for new builds needing FP8/FP4, Confidential Computing, or higher NVLink bandwidth; keep Ampere where the workload fits its precision/feature envelope and the support lifecycle is acceptable.

Networking¶

Ampere is the HDR InfiniBand era. Generation-specific fabric:

NIC: ConnectX-6, single/dual-port 200 Gb/s HDR InfiniBand (also 200GbE), PCIe Gen4, with GPUDirect RDMA for direct NIC-to-HBM DMA. HDR100 (100 Gb/s) splits are common where the switch port is shared.
Switch: Quantum HDR (QM8700/QM8790), 40x 200 Gb/s HDR ports on QSFP56 (or 80x HDR100 via splitters), 16 Tb/s aggregate. RoCE Ethernet (Spectrum) is the alternative.
Per-port rate: 200 Gb/s line rate per ConnectX-6 HDR port; one step below Hopper's 400 Gb/s NDR (ConnectX-7) and Blackwell's XDR (ConnectX-8). Size inter-node collectives accordingly.
NVLink: third-gen NVLink 3 at 600 GB/s/GPU. On SXM4 A100 the NVLink/NVSwitch domain (8-GPU HGX baseboard, 6 NVSwitches on DGX A100) carries intra-node all-reduce. PCIe A100/A30/A40 expose only a 2-way NVLink Bridge; A10 has no NVLink at all (PCIe P2P only).
Fabric Manager applicability: required ONLY on NVSwitch systems (HGX/DGX A100). A40 / A10 / A30 / PCIe A100 have no NVSwitch and therefore no Fabric Manager.

For the actual bring-up, validation, and benchmark commands (NVLink/NVSwitch checks, ibstat/ibping, nvbandwidth, nccl-tests all-reduce bus-bandwidth), use the shared keystone runbook (do not duplicate it here): Fabric bring-up, validation & benchmarking.

Ampere-specific pointers:

A40/A10 have no NVLink/NVSwitch; they rely on PCIe P2P + the host NIC. Validate NCCL PCIe P2P fallback explicitly and confirm GPUDirect RDMA over ConnectX-6; do not expect NVLink bandwidth. (A10 has no NVLink Bridge at all.)
RoCE on Spectrum Ethernet. If the fabric is RoCE rather than InfiniBand, lossless transport requires DCBX/PFC (and ideally ECN) configured end-to-end on the Spectrum switch and NIC; misconfigured PFC silently tanks collective throughput. Validate with nccl-tests before production.
A30 MIG + single NIC. A30 partitions into 4 MIG slices but has no NVSwitch; inter-node still goes via the single ConnectX-6 HDR port. Rail design matters less, NIC saturation matters more.
Verify FM before collectives on A100 SXM. On HGX/DGX A100 nvidia-fabricmanager must be active or NCCL degrades to PCIe; nvidia-smi nvlink --status should show NVLink 3 links up.

See networking fabric.

Gotchas & failure modes¶

A40/A10 are not A100. They are GDDR6 render/inference dies with no MIG and (A10) no NVLink; do not plan MIG slicing or NVSwitch collectives on them.
PCIe vs SXM A100. Only SXM4 A100 on an HGX/DGX baseboard gives the NVSwitch 600 GB/s domain; PCIe A100 has a 2-way bridge only.
No FP8 / no Confidential Computing. Workloads requiring the Transformer Engine, FP8, or a GPU TEE cannot run on Ampere; this is the hard ceiling versus Hopper.
Support lifecycle. As the oldest generation, Ampere is first to lose new-driver attention. Pin to a still-supported datacenter LTS branch and verify EOL before committing to a long-lived fleet (see driver upgrade runbook).
Fabric Manager misuse. Enabling Fabric Manager on a non-NVSwitch node (A40/A10/PCIe A100) is wrong; it belongs only on HGX/DGX A100.
A800 export cut. Reduced NVLink bandwidth versus A100; do not assume full A100 collective throughput.

References¶

NVIDIA A100 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/a100/
NVIDIA A30 Tensor Core GPU (product page): https://www.nvidia.com/en-us/data-center/products/a30-gpu/
NVIDIA A40 GPU (product page): https://www.nvidia.com/en-us/data-center/a40/
NVIDIA A10 GPU (product page): https://www.nvidia.com/en-us/data-center/products/a10-gpu/
NVIDIA Ampere architecture (technology overview): https://www.nvidia.com/en-us/technologies/ampere-architecture/
NVIDIA Ampere architecture whitepaper: https://resources.nvidia.com/en-us-gpu-resources/nvidia-ampere-architecture-whitepaper
NVIDIA Ampere architecture whitepaper (PDF): https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
NVIDIA Tensor Core GPU datasheet (A100): https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet
NVIDIA MIG user guide: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
NVIDIA datacenter driver documentation: https://docs.nvidia.com/datacenter/tesla/drivers/index.html
NVIDIA datacenter driver installation guide (Ubuntu): https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html
NVIDIA CUDA installation guide for Linux: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
NVIDIA Fabric Manager user guide: https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
NVIDIA Fabric Manager apt packaging (package names, version-match requirement): https://github.com/NVIDIA/apt-packaging-fabric-manager
NVIDIA driver persistence / nvidia-persistenced: https://docs.nvidia.com/deploy/driver-persistence/index.html
NVIDIA MLNX_OFED to DOCA-OFED transition guide: https://docs.nvidia.com/doca/sdk/mlnx_ofed-to-doca-ofed-transition-guide/index.html
NVIDIA ConnectX-6 InfiniBand adapter (product page): https://www.nvidia.com/en-us/networking/infiniband-adapters/
NVIDIA Quantum HDR InfiniBand switches (QM8700): https://www.nvidia.com/en-us/networking/infiniband-switching/