Markdown

Datacentre physical readiness¶

Scope: reading datacentre drawings and confirming the facility can actually host the cluster. Power, UPS, cooling, airflow, weight, and the schematics that describe them. This is the most facility-specific layer in this knowledge base.

flowchart LR
  FLOOR["Floor plan"] --> POWER["Power path"]
  FLOOR --> COOLING["Cooling path"]
  POWER --> READY["Rack readiness"]
  COOLING --> READY
  READY --> BOM["BOM validation"]

Overview¶

Blackwell-class density has moved the deployment risk from compute to facility. Memory capacity is no longer the constraint; power delivery and heat rejection are. The skill is to read a floor plan, power schematic, and cooling layout and state, with numbers, whether the hall can take the load.

Core knowledge¶

Drawing types to read¶

Floor plan / whitespace layout: rack positions, aisles, clearances, structural load zones, routes for power and liquid.
Rack elevations: U-by-U layout, weight distribution, top-of-rack switch placement.
Power distribution / one-line diagram: utility feed, transformers, switchgear, UPS, PDU, busway, down to rack whips. Phases and redundancy.
UPS schematic: topology (N, N+1, 2N), runtime, static transfer switches.
Cooling / mechanical layout: CDUs, manifolds, pipework, CRAH/CRAC or rear-door heat exchangers, hot/cold aisle containment, airflow direction.

Power density and the Blackwell reality¶

B300 GPU TDP is 1,400 W. A GB300 NVL72 rack draws roughly 120 to 140 kW (sources cite ~120 kW, Supermicro's range 132 to 140 kW, Microsoft ~136 kW). This is an order of magnitude above a conventional enterprise rack.
Transient behaviour: power can spike to about 1.4x steady-state during gradient synchronisation, at microsecond scale. NVIDIA uses power smoothing (energy storage and burn mechanisms) and multiple power-shelf configurations to absorb synchronous load ramps. Plan feeds and protection for the transient, not just the average.
Harmonics: GB300 racks show meaningful total harmonic distortion under training load. Beyond roughly eight racks, dedicated transformers with 12-pulse rectifiers are typically needed to stay IEEE 519 compliant.

Power & cooling by GPU generation¶

Per-GPU board power has roughly tripled across two generations, and the connector and cooling story changes with it (GPU generations):

GPU	Board power	Cooling viability
A100 SXM (Ampere)	400 W	Air
H100 SXM (Hopper)	700 W (configurable)	Air (dense) to liquid
B200 (Blackwell)	~1,000 W	Liquid in DC density
B300 (Blackwell Ultra)	1,400 W	Liquid mandatory

Air cooling stays viable through Hopper. H100-class racks are still routinely air-cooled; Blackwell-class density (B200 ~1 kW, B300 1.4 kW per GPU) pushes per-rack load to ~120-140 kW for a GB300 NVL72 and makes direct-to-chip liquid mandatory (consistent with the B300 figures already noted above).
Connectors differ by tier. Datacenter SXM modules draw from a baseboard/busbar (no per-card cable). Consumer cards use a single sequential cable: 12VHPWR on Ada (RTX 4090) and 12V-2x6 on Blackwell consumer (RTX 5090, 575 W total graphics power, fed by a Gen5 12V cable or 4x 8-pin adapter). The RTX PRO 6000 Blackwell (up to 600 W) takes a single CEM5 16-pin connector.
Per-rack implication. Air-cooled enterprise racks top out well below a single Blackwell GPU tray. Sizing PDU phase, whip, and connector to the SXM busbar (datacenter) versus a per-card 16-pin (PCIe pro/consumer) is a distinct BOM check (BOM validation); the consumer 12V connectors carry near their rated limit and demand correct seating and gauge.

Cooling¶

At 1,400 W per GPU, liquid cooling is mandatory in all B300 form factors; air cooling is insufficient. GB300 NVL72 ships with integrated liquid cooling; HGX B300 baseboards need an OEM liquid solution.
Direct-to-chip cold plates plus CDUs and rear-door heat exchangers. Rear-door HX programmes claim per-rack capacities well above 100 kW.
Practical rule from the field: size cooling for about 110% of rated TDP to absorb thermal spikes without throttling.

Mechanical and structural¶

A GB300 NVL72 cabinet weighs roughly 1.36 t (about 3,000 lb). It is a 48U rack but occupies a standard 42U floor footprint, so the load lands on a conventional tile area. Confirm floor loading.
Centre of gravity sits higher than standard servers due to dense upper-tray compute. Positive seismic/anchor hardware (bolted to the slab, not standard cage nuts) is advised against micro-vibration and tip risk during full load.

Connectivity to the outside¶

Uplink to customer edge: at least 2x 100 GbE with single-mode (DR1), BGP peering for route handover, in-band and OOB routes announced.

Don't-miss checklist¶

Confirm rack power feed, phase, connector, and redundancy match the BOM PDUs (BOM validation).
Confirm the hall's per-rack cooling capacity meets or exceeds rack TDP plus margin.
Confirm liquid-cooling loop: CDU capacity, flow rate, supply temperature, leak detection.
Confirm floor loading and anchoring for rack weight and centre of gravity.
Confirm UPS topology and runtime against the load.
Check harmonic mitigation for multi-rack deployments.
Confirm cable routes (power and fibre) on the floor plan match the run lengths in the BOM.

Failure modes¶

Hall rated for air cooling or for far lower per-rack kW than a GB300 rack needs.
Feed sized for steady-state, tripping on transient spikes during training.
Floor loading or anchoring inadequate for cabinet weight and high CoG.
Harmonics out of compliance once several racks are populated.
Cable routes on the plan shorter or longer than the procured media supports.

Open questions & validation¶

Build fluency reading a one-line power diagram and a cooling P&ID against real facility drawings.
Learn the per-rack cooling maths well enough to assess a layout live.
Confirm per-rack cooling capacity and harmonic mitigation against the actual hall before populating multiple racks (BOM validation).

References¶

GB300 deployment, power, cooling, weight, harmonics: https://introl.com/blog/why-nvidia-gb300-nvl72-blackwell-ultra-matters
Blackwell Ultra infrastructure requirements (power trajectory, liquid cooling): https://introl.com/blog/nvidia-blackwell-ultra-b300-infrastructure-requirements-2025
Facility planning framing (whitespace, cooling loops, power phases): https://radiant.co/blog/nvidia-blackwell-ultra-b300-gb300-gpus
NVIDIA A100 (400 W SXM): https://www.nvidia.com/en-us/data-center/a100/
NVIDIA H100 (700 W SXM, configurable): https://www.nvidia.com/en-us/data-center/h100/
NVIDIA GeForce RTX 5090 (575 W, Gen5 12V connector): https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/
NVIDIA RTX PRO 6000 Blackwell (up to 600 W): https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/

Related: BOM · Commissioning · Platform · Glossary