Datacentre physical readiness¶
Scope: reading datacentre drawings and confirming the facility can actually host the cluster. Power, UPS, cooling, airflow, weight, and the schematics that describe them. This is the most facility-specific layer in this knowledge base.
flowchart LR
FLOOR["Floor plan"] --> POWER["Power path"]
FLOOR --> COOLING["Cooling path"]
POWER --> READY["Rack readiness"]
COOLING --> READY
READY --> BOM["BOM validation"]
Overview¶
Blackwell-class density has moved the deployment risk from compute to facility. Memory capacity is no longer the constraint; power delivery and heat rejection are. The skill is to read a floor plan, power schematic, and cooling layout and state, with numbers, whether the hall can take the load.
Core knowledge¶
Drawing types to read¶
- Floor plan / whitespace layout: rack positions, aisles, clearances, structural load zones, routes for power and liquid.
- Rack elevations: U-by-U layout, weight distribution, top-of-rack switch placement.
- Power distribution / one-line diagram: utility feed, transformers, switchgear, UPS, PDU, busway, down to rack whips. Phases and redundancy.
- UPS schematic: topology (N, N+1, 2N), runtime, static transfer switches.
- Cooling / mechanical layout: CDUs, manifolds, pipework, CRAH/CRAC or rear-door heat exchangers, hot/cold aisle containment, airflow direction.
Power density and the Blackwell reality¶
- B300 GPU TDP is 1,400 W. A GB300 NVL72 rack draws roughly 120 to 140 kW (sources cite ~120 kW, Supermicro's range 132 to 140 kW, Microsoft ~136 kW). This is an order of magnitude above a conventional enterprise rack.
- Transient behaviour: power can spike to about 1.4x steady-state during gradient synchronisation, at microsecond scale. NVIDIA uses power smoothing (energy storage and burn mechanisms) and multiple power-shelf configurations to absorb synchronous load ramps. Plan feeds and protection for the transient, not just the average.
- Harmonics: GB300 racks show meaningful total harmonic distortion under training load. Beyond roughly eight racks, dedicated transformers with 12-pulse rectifiers are typically needed to stay IEEE 519 compliant.
Power & cooling by GPU generation¶
Per-GPU board power has roughly tripled across two generations, and the connector and cooling story changes with it (GPU generations):
| GPU | Board power | Cooling viability |
|---|---|---|
| A100 SXM (Ampere) | 400 W | Air |
| H100 SXM (Hopper) | 700 W (configurable) | Air (dense) to liquid |
| B200 (Blackwell) | ~1,000 W | Liquid in DC density |
| B300 (Blackwell Ultra) | 1,400 W | Liquid mandatory |
- Air cooling stays viable through Hopper. H100-class racks are still routinely air-cooled; Blackwell-class density (B200 ~1 kW, B300 1.4 kW per GPU) pushes per-rack load to ~120-140 kW for a GB300 NVL72 and makes direct-to-chip liquid mandatory (consistent with the B300 figures already noted above).
- Connectors differ by tier. Datacenter SXM modules draw from a baseboard/busbar (no per-card cable). Consumer cards use a single sequential cable: 12VHPWR on Ada (RTX 4090) and 12V-2x6 on Blackwell consumer (RTX 5090, 575 W total graphics power, fed by a Gen5 12V cable or 4x 8-pin adapter). The RTX PRO 6000 Blackwell (up to 600 W) takes a single CEM5 16-pin connector.
- Per-rack implication. Air-cooled enterprise racks top out well below a single Blackwell GPU tray. Sizing PDU phase, whip, and connector to the SXM busbar (datacenter) versus a per-card 16-pin (PCIe pro/consumer) is a distinct BOM check (BOM validation); the consumer 12V connectors carry near their rated limit and demand correct seating and gauge.
Cooling¶
- At 1,400 W per GPU, liquid cooling is mandatory in all B300 form factors; air cooling is insufficient. GB300 NVL72 ships with integrated liquid cooling; HGX B300 baseboards need an OEM liquid solution.
- Direct-to-chip cold plates plus CDUs and rear-door heat exchangers. Rear-door HX programmes claim per-rack capacities well above 100 kW.
- Practical rule from the field: size cooling for about 110% of rated TDP to absorb thermal spikes without throttling.
Mechanical and structural¶
- A GB300 NVL72 cabinet weighs roughly 1.36 t (about 3,000 lb). It is a 48U rack but occupies a standard 42U floor footprint, so the load lands on a conventional tile area. Confirm floor loading.
- Centre of gravity sits higher than standard servers due to dense upper-tray compute. Positive seismic/anchor hardware (bolted to the slab, not standard cage nuts) is advised against micro-vibration and tip risk during full load.
Connectivity to the outside¶
- Uplink to customer edge: at least 2x 100 GbE with single-mode (DR1), BGP peering for route handover, in-band and OOB routes announced.
Don't-miss checklist¶
- Confirm rack power feed, phase, connector, and redundancy match the BOM PDUs (BOM validation).
- Confirm the hall's per-rack cooling capacity meets or exceeds rack TDP plus margin.
- Confirm liquid-cooling loop: CDU capacity, flow rate, supply temperature, leak detection.
- Confirm floor loading and anchoring for rack weight and centre of gravity.
- Confirm UPS topology and runtime against the load.
- Check harmonic mitigation for multi-rack deployments.
- Confirm cable routes (power and fibre) on the floor plan match the run lengths in the BOM.
Failure modes¶
- Hall rated for air cooling or for far lower per-rack kW than a GB300 rack needs.
- Feed sized for steady-state, tripping on transient spikes during training.
- Floor loading or anchoring inadequate for cabinet weight and high CoG.
- Harmonics out of compliance once several racks are populated.
- Cable routes on the plan shorter or longer than the procured media supports.
Open questions & validation¶
- Build fluency reading a one-line power diagram and a cooling P&ID against real facility drawings.
- Learn the per-rack cooling maths well enough to assess a layout live.
- Confirm per-rack cooling capacity and harmonic mitigation against the actual hall before populating multiple racks (BOM validation).
References¶
- GB300 deployment, power, cooling, weight, harmonics: https://introl.com/blog/why-nvidia-gb300-nvl72-blackwell-ultra-matters
- Blackwell Ultra infrastructure requirements (power trajectory, liquid cooling): https://introl.com/blog/nvidia-blackwell-ultra-b300-infrastructure-requirements-2025
- Facility planning framing (whitespace, cooling loops, power phases): https://radiant.co/blog/nvidia-blackwell-ultra-b300-gb300-gpus
- NVIDIA A100 (400 W SXM): https://www.nvidia.com/en-us/data-center/a100/
- NVIDIA H100 (700 W SXM, configurable): https://www.nvidia.com/en-us/data-center/h100/
- NVIDIA GeForce RTX 5090 (575 W, Gen5 12V connector): https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/
- NVIDIA RTX PRO 6000 Blackwell (up to 600 W): https://www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000/
Related: BOM · Commissioning · Platform · Glossary