Markdown

Build-vs-rent GPU cost model¶

Scope: A concrete TCO model to decide owning vs renting GPUs: capex (GPUs, network, facility) plus opex (power, cooling, staff) against a cloud GPU-hour rate, the breakeven utilization, and a runnable Python model with illustrative inputs the reader edits.

What it is¶

A total-cost-of-ownership (TCO) model that reduces the build-vs-rent decision to one comparison: the amortized $/GPU-hour of owned capacity versus the rented $/GPU-hour. Owned cost is capex spread over a depreciation horizon plus opex (power, cooling, staff, network/storage), divided by the GPU-hours actually delivered. Because the denominator is delivered hours, the model's pivot is sustained utilization, the fraction of wall-clock the fleet does useful work. The output is a breakeven utilization: above it, owning is cheaper per useful hour; below it, renting wins.

This is the quantitative form of the build-vs-rent through-line in cloud, neoclouds and cost: utilization is the lever, and an idle owned GPU burns the full amortized rate while delivering nothing.

flowchart LR
  CAPEX["Capex: GPUs + network + facility"] --> AMORT["Amortize over horizon (yrs)"]
  OPEX["Opex/yr: power + cooling + staff + network"] --> ANNUAL["Annual owned cost"]
  AMORT --> ANNUAL
  ANNUAL --> PERHR["Owned $/delivered GPU-hour = annual / (GPUs x 8760 x utilization)"]
  UTIL["Sustained utilization"] --> PERHR
  RENT["Rented $/GPU-hour"] --> CMP{"Owned < Rented ?"}
  PERHR --> CMP
  CMP -->|"yes"| OWN["Build"]
  CMP -->|"no"| LEASE["Rent / neocloud / spot"]

Why it matters¶

The decision is usually framed as a sticker-price comparison ("an H100 costs $X to buy vs $Y/hr to rent"), which is wrong on both ends. Owning adds power, cooling, networking, storage, staff, and depreciation that often double the GPU sticker into a delivered cost. Renting that looks expensive per hour can be cheaper overall if your fleet sits idle half the time. The only honest comparison is delivered $/GPU-hour at your real utilization, and utilization here means SM-active/MFU, not the misleading nvidia-smi "GPU-util" (observability and monitoring, MFU regression runbook).

Two facts dominate the arithmetic:

Facility overhead is large. GPUs are only a fraction of drawn power: in a typical frontier AI datacenter, total server power is roughly 1.5x GPU power alone, and only ~40% of total facility power is the GPUs themselves once cooling and interconnect are counted.¹ Power-usage-effectiveness (PUE), total facility power over IT power, runs ~1.1-1.2 for best-in-class AI facilities and ~1.5-1.6 at the global colo average.² An 8x H100 SXM node draws ~10 kW at the wall, not 8 x 700 W = 5.6 kW.³
Depreciation is fast. GPU generations turn over on roughly a yearly cadence (Hopper, Blackwell, Rubin), so a 3-4 year straight-line horizon is aggressive; residual value is uncertain and export-control sensitive (vendor sourcing and procurement).

Get those two wrong and the breakeven moves by tens of points of utilization.

When it is needed (and when not)¶

Run this model when:

Committing capex to an owned hall or a multi-year reserved/committed cloud contract.
Sizing a fleet against sustained demand (steady training, always-on inference).
Comparing reserved vs on-demand vs spot consumption models (cloud, neoclouds and cost).

Skip owning (rent instead) when:

Demand is bursty or uncertain: peaks you cannot keep busy mean low utilization and a losing breakeven.
You need a specific generation now and lead time/power are the binding constraints, not budget (datacentre readiness is covered in the source KB).
The workload is fault-tolerant and checkpointed: spot/preemptible undercuts both owning and on-demand, with checkpoint discipline (distributed training recipes, DiLoCo).
You lack confirmed power and cooling to host the capacity; owning idle, unpowered racks is pure loss.

The model is a decision aid, not a forecast. Its inputs (utilization, horizon, residual, rate trajectory) carry the uncertainty. Vary them, do not trust a single point estimate.

How: implement, integrate, maintain¶

The model below is runnable as-is (python3 build_vs_rent.py, stdlib only). All numbers are illustrative inputs the reader must replace with quotes from vendor sourcing and your power tariff. It computes annual owned cost, delivered $/GPU-hour across a utilization sweep, and the breakeven utilization where owned equals rented.

#!/usr/bin/env python3
"""Build-vs-rent GPU TCO model. Stdlib only. ALL INPUTS ILLUSTRATIVE — edit them."""
from __future__ import annotations

from dataclasses import dataclass

HOURS_PER_YEAR = 8760  # 24 * 365


@dataclass(frozen=True)
class Inputs:
    # --- fleet shape (illustrative) ---
    gpus: int = 256                      # GPU count in the owned fleet
    gpu_watts: float = 700.0             # per-GPU TDP, W (H100 SXM ~700 W)

    # --- capex, USD (illustrative; get real quotes) ---
    gpu_capex_per_gpu: float = 30_000.0  # GPU + its share of server/chassis
    network_capex_per_gpu: float = 6_000.0   # InfiniBand/Ethernet NICs, switches, cables
    facility_capex_per_gpu: float = 4_000.0  # power, cooling, racks, fit-out share
    horizon_years: float = 4.0           # straight-line depreciation horizon
    residual_fraction: float = 0.10      # salvage value as fraction of GPU capex

    # --- opex, USD/year ---
    power_usd_per_kwh: float = 0.10      # all-in electricity tariff
    pue: float = 1.3                     # facility power / IT power (AI-optimized ~1.1-1.3)
    staff_usd_per_year: float = 600_000.0    # ops/platform headcount, fully loaded
    other_opex_per_gpu_year: float = 1_500.0 # storage, software, bandwidth, maintenance

    # --- rent comparison ---
    rent_usd_per_gpu_hour: float = 2.40  # neocloud on-demand H100 SXM (illustrative)


def annual_owned_cost(i: Inputs) -> dict[str, float]:
    capex = (i.gpu_capex_per_gpu + i.network_capex_per_gpu + i.facility_capex_per_gpu) * i.gpus
    residual = i.gpu_capex_per_gpu * i.gpus * i.residual_fraction
    depreciation = (capex - residual) / i.horizon_years

    # IT power -> facility power via PUE. Assume drawn power tracks utilization-weighted
    # elsewhere; here we charge nameplate IT power at PUE for the worst-case envelope.
    it_kw = i.gpus * i.gpu_watts / 1000.0
    facility_kw = it_kw * i.pue
    power = facility_kw * HOURS_PER_YEAR * i.power_usd_per_kwh

    other = i.other_opex_per_gpu_year * i.gpus
    total = depreciation + power + i.staff_usd_per_year + other
    return {
        "depreciation": depreciation,
        "power": power,
        "staff": i.staff_usd_per_year,
        "other": other,
        "total": total,
    }


def owned_per_gpu_hour(i: Inputs, utilization: float) -> float:
    assert 0.0 < utilization <= 1.0, "utilization must be in (0, 1]"
    delivered_hours = i.gpus * HOURS_PER_YEAR * utilization
    return annual_owned_cost(i)["total"] / delivered_hours


def breakeven_utilization(i: Inputs) -> float | None:
    """Utilization at which owned $/delivered-hour == rented $/hour."""
    annual = annual_owned_cost(i)["total"]
    rented_annual_full = i.rent_usd_per_gpu_hour * i.gpus * HOURS_PER_YEAR
    if rented_annual_full <= 0:
        return None
    u = annual / rented_annual_full
    return u if 0.0 < u <= 1.0 else None  # None => owning never breaks even at any util


def main() -> None:
    i = Inputs()
    cost = annual_owned_cost(i)
    print(f"Fleet: {i.gpus} GPUs, horizon {i.horizon_years} yr, PUE {i.pue}, "
          f"rent ${i.rent_usd_per_gpu_hour:.2f}/GPU-hr (ALL ILLUSTRATIVE)\n")
    print("Annual owned cost breakdown (USD):")
    for k in ("depreciation", "power", "staff", "other", "total"):
        print(f"  {k:>13}: {cost[k]:>14,.0f}")

    print("\nDelivered owned $/GPU-hour vs sustained utilization:")
    print(f"  {'util':>6}  {'owned $/hr':>12}  {'vs rent':>10}")
    for pct in (20, 30, 40, 50, 60, 70, 80, 90):
        u = pct / 100.0
        owned = owned_per_gpu_hour(i, u)
        verdict = "BUILD" if owned < i.rent_usd_per_gpu_hour else "rent"
        print(f"  {pct:>5}%  {owned:>11.2f}  {verdict:>10}")

    be = breakeven_utilization(i)
    if be is None:
        print("\nBreakeven: owning never beats renting within (0, 100%] at these inputs.")
    else:
        print(f"\nBreakeven utilization: {be*100:.1f}%  "
              f"(sustain above this => build is cheaper per useful hour)")


if __name__ == "__main__":
    main()

Implement. Replace every field in Inputs with real figures: GPU + server capex and network capex from vendor sourcing and procurement; facility capex and the power tariff/PUE from your datacenter contract; rent rate from a live neocloud quote (cloud, neoclouds and cost). The single most consequential edit is horizon_years and residual_fraction: fast obsolescence shortens the horizon and shrinks residual, both of which raise the owned rate.

Integrate. Feed the model's utilization input from measured fleet data, not assumptions. Compute it from SM-active/MFU telemetry (telemetry, monitoring and alerting, observability and monitoring), and validate that delivered utilization at bring-up (workload bring-up recipes, smoke tests, fabric bring-up and benchmarking) clears the breakeven before committing. Tie utilization to the goodput targets in your SLO/SLI catalog: a scheduler that keeps GPUs busy (cluster orchestration, Slurm, Kubernetes, Volcano scheduler, Volcano job manifest, Ray) is what moves an owned fleet above breakeven.

Maintain. Re-run on every capex refresh, tariff change, or rate move. Watch two regressions that silently push you below breakeven: MFU decay (MFU regression runbook) and idle reserved capacity. For inference fleets, track $/token alongside $/GPU-hour and alert on SLO breaches that strand capacity (inference serving, serving OSS models, inference SLO breach runbook). Gate unhealthy nodes out of the utilization denominator so a degraded GPU does not flatter the math (GPU health gating).

The same model line ports directly to a spreadsheet: one column of Inputs, the breakeven cell is annual_owned_total / (rent_rate * gpus * 8760), and the utilization sweep is one row of =annual_owned_total / (gpus * 8760 * util).

References¶

AWS Capacity Blocks for ML: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html
GCP AI Hypercomputer / accelerator-optimized machines: https://cloud.google.com/compute/docs/accelerator-optimized-machines
Azure GPU-accelerated VM sizes: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/overview#gpu-accelerated
NVIDIA DGX H100/H200 system introduction (system power envelope): https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-to-dgxh100.html
NVIDIA DGX SuperPOD H100 data center design (electrical specifications): https://docs.nvidia.com/dgx-superpod/design-guides/dgx-superpod-data-center-design-h100/latest/electrical.html
FinOps Foundation (cloud cost discipline, unit economics): https://www.finops.org/
Uptime Institute / PUE definition and trends: https://uptimeinstitute.com/

Epoch AI, "GPUs account for about 40% of power usage in AI data centers" — total server power ~1.53x GPU power in a typical frontier AI datacenter. https://epoch.ai/data-insights/gpus-power-usage-in-ai-data-centers ↩
Best-in-class hyperscale/AI-optimized PUE ~1.1-1.3; global colo average ~1.5-1.6. NVIDIA DGX SuperPOD data center design guide (electrical): https://docs.nvidia.com/dgx-superpod/design-guides/dgx-superpod-data-center-design-h100/latest/electrical.html ↩
NVIDIA DGX H100 (8x H100 SXM) maximum system power ~10.2 kW at the wall, vs 8x700 W = 5.6 kW for the GPUs alone. https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-to-dgxh100.html ↩