Markdown

GPU confidential computing & device attestation¶

Scope: running a workload on an NVIDIA GPU so that its code, weights, and data are protected in use from the host, the hypervisor, and the cloud operator. This covers the GPU Trusted Execution Environment (TEE) on Hopper and Blackwell, the confidential VM (CVM) it pairs with, the SPDM/attestation flow that proves the GPU is genuine and unmodified before a secret is released, and how a renter verifies a remote GPU. This is the focused deployable child of Security, Isolation & Multi-tenancy.

What it is¶

Confidential Computing (CC) protects data in use, while it sits in GPU memory and is computed on. That is the gap storage encryption (at rest) and TLS (in transit) leave open. On NVIDIA datacenter GPUs the protection is a hardware TEE: an encrypted, access-controlled execution environment whose contents are unreadable to everything outside it, including privileged host software.¹

Two pieces make a confidential GPU workload:

A confidential VM (CVM) on the CPU side, either AMD SEV-SNP or Intel TDX, that holds the application and its plaintext data inside a CPU TEE the hypervisor cannot inspect.⁶⁷
The GPU TEE, paired to that CVM. On Hopper (H100/H200), the first NVIDIA GPU with CC, the TEE is per-GPU over the PCIe boundary: the GPU's HBM is placed in a hardware-firewalled compute-protected region, and every byte crossing PCIe between the CVM and the GPU is encrypted and authenticated (AES-GCM) through bounce buffers.³ Blackwell (B-series, GB200/GB300) adds TEE-I/O, extending the TEE across the NVLink/NVSwitch fabric and encrypting the PCIe/NVLink interfaces and HBM directly. This removes Hopper's single-GPU PCIe-bounce bottleneck, so confidential workloads span many GPUs at near-line-rate (see gpu-hopper, nvidia-blackwell-platform).⁴ Ampere (A100) and consumer GPUs have no confidential mode (see the security-multitenancy tier matrix).

The TEE is only half the guarantee. Attestation is the proof that the TEE is real and untampered before any secret enters it. The GPU measures its own firmware/microcode and signs an attestation report with a device identity key rooted in an NVIDIA-provisioned certificate chain. A relying party verifies that report against NVIDIA's reference integrity measurements (RIM), using either the cloud NVIDIA Remote Attestation Service (NRAS) or a local verifier, and releases the workload's keys only on success. The CPU-to-GPU secure session and measurement transport use SPDM (DMTF Security Protocol and Data Model), the same standard that underpins firmware attestation generally.²⁵

The single most important rule falls out of that: the gate fails closed. Keys release only when the signature chains to the trusted root, the measurements equal the published RIM, and the nonce is fresh, all three, together. The block below is that gate, made concrete and adversarially exercised: a full fail-closed truth table, an all-bit sweep proving a single flipped firmware bit is caught, and an equivalence check against a slow reference.

# Runnable on system python3 (numpy). Core algorithm: the fail-closed attestation gate.
# Keys release IFF the report signature chains to the trusted root AND every measured
# component equals the reference integrity measurement (RIM) AND the nonce is fresh.
import numpy as np


def attest_verdict(sig_root_ok: bool,
                   measured: np.ndarray,
                   rim: np.ndarray,
                   nonce_fresh: bool) -> bool:
    """Return True (release keys) only when ALL three checks pass. Fail closed."""
    assert measured.shape == rim.shape and measured.ndim == 1
    measurements_match = bool(np.array_equal(measured, rim))
    return bool(sig_root_ok) and measurements_match and bool(nonce_fresh)


rng = np.random.default_rng(0)
K = 8                                          # number of measured firmware components
rim = rng.integers(0, 256, size=K, dtype=np.uint8)      # published reference (RIM)
good = rim.copy()                              # a genuine, untampered GPU reports the RIM

# 1. Happy path: signature valid, measurements == RIM, fresh nonce -> release.
assert attest_verdict(True, good, rim, True) is True

# 2. Fail-closed truth table: if ANY single check fails, keys must NOT release. This is
#    the structural guarantee the page claims (bind key release to the attestation
#    verdict; never release on a partial pass).
for s, m_ok, n in [(False, True, True),        # bad signature chain
                   (True, False, True),        # measurement mismatch
                   (True, True, False)]:        # stale / replayed nonce
    meas = good if m_ok else (good ^ np.uint8(1))   # flip a bit -> mismatch
    assert attest_verdict(s, meas, rim, n) is False, (s, m_ok, n)

# 3. Adversarial / corruption detection: a single flipped bit anywhere in the firmware
#    measurement (a tampered VBIOS/GSP image) must be caught. Sweep every bit position.
for i in range(K):
    for b in range(8):
        tampered = good.copy()
        tampered[i] ^= np.uint8(1 << b)
        assert attest_verdict(True, tampered, rim, True) is False, (i, b)

# 4. Equivalence to a slow reference: the AND-gate must equal an explicit all()-of-checks.
def slow_verdict(sig_ok, meas, rim, fresh):
    checks = [sig_ok, fresh] + [bool(x == y) for x, y in zip(meas.tolist(), rim.tolist())]
    return all(checks)

for _ in range(2000):
    sig = bool(rng.integers(0, 2))
    fresh = bool(rng.integers(0, 2))
    meas = rim.copy()
    ncorrupt = int(rng.integers(0, K + 1))     # randomly corrupt 0..K components
    if ncorrupt:
        idx = rng.choice(K, size=ncorrupt, replace=False)
        meas[idx] ^= np.uint8(0xFF)
    assert attest_verdict(sig, meas, rim, fresh) == slow_verdict(sig, meas, rim, fresh)

print("attest_verdict OK:", f"components={K}, fail-closed truth table + all-bit sweep + 2000 random cases")

Why use it¶

The asset is the workload itself. On rented or shared infrastructure (neoclouds, decentralized GPU marketplaces, a colo you do not own), model weights, prompts, and training data are exposed in GPU memory to whoever controls the host. A privileged operator can dump HBM, snapshot a VM, or read the PCIe bus. CC closes that path: it lets you run proprietary weights or regulated data on hardware you do not trust, which is the precondition for confidential inference, confidential fine-tuning, secure multi-party / federated training, and sovereign-AI deployments where the data cannot leave a jurisdiction in the clear.¹

Attestation matters independently of confidentiality. It answers "is this the GPU it claims to be, running untampered firmware?", the question a renter or marketplace must answer before trusting a remote node. Hardware identity rooted in a device certificate is far stronger than a self-reported nvidia-smi string, which a malicious provider can forge. The same primitive underpins supply-chain integrity: signed VBIOS/GSP and measured boot stop a below-the-OS compromise that survives reinstalls (security-multitenancy).

When to use it (and when not)¶

Use GPU CC when:

Sensitive weights or data run on infrastructure you do not control: rented GPUs, neoclouds, multi-tenant clusters, or any "bring your model to someone else's silicon" arrangement (cloud-neoclouds-cost, gpu-provider-landscape).
Regulatory or contractual mandates require data-in-use protection (health, finance, government, cross-border).
Multiple parties contribute models or data to one computation and none may see the others' inputs (federated/secure-aggregation training).

Reach for attestation alone (without full CC) when you only need to verify a remote GPU's identity and firmware integrity (capability proofs for a marketplace, fleet supply-chain audits) but the workload itself is not secret. This is the boundary with remote GPU verification: attestation proves firmware identity cryptographically, a timing challenge proves delivered capability, and a strong marketplace uses both.

Do not pay for CC when:

The cluster is single-tenant on hardware you own and physically control. The host is already in your trust boundary, and CC adds attestation and operational overhead for no new guarantee.
You are on Ampere or consumer GPUs, which have no TEE. Isolation there stops at MIG/MPS/dedicated-node (security-multitenancy), and a GeForce node is unfit for untrusted multi-tenant work regardless.
The threat you actually face is a noisy neighbour or fault isolation problem, not a confidentiality one. That is MIG, not CC.

Architecture¶

The host, hypervisor, and operator sit outside the trusted computing base (TCB). Inside it are two attested components, the CVM and the GPU TEE, joined by an SPDM secure session. On Hopper that session tunnels through AES-GCM bounce buffers over PCIe; on Blackwell TEE-I/O it is native encryption of the PCIe/NVLink interfaces and HBM. Nothing enters the TCB until the GPU's signed attestation report has been checked against the RIM and a key broker has released the workload's keys on that verdict alone.

flowchart LR
  subgraph HOST["Untrusted host / hypervisor / operator"]
    HV["Hypervisor"]
  end
  subgraph TCB["Trusted computing base (attested)"]
    CVM["CVM (SEV-SNP / TDX)<br/>plaintext data + code"]
    GPU["GPU TEE<br/>compute-protected HBM"]
  end
  CVM <-->|"SPDM secure session<br/>AES-GCM bounce buffers (Hopper)<br/>encrypted NVLink/PCIe (Blackwell TEE-I/O)"| GPU
  GPU -->|"signed attestation report"| VER["Verifier: NRAS or local (nvtrust)"]
  VER -->|"measurements match RIM?"| KMS["KMS / key broker"]
  KMS -->|"release keys only if attested"| CVM
  HV -.->|"cannot read TEE contents"| TCB

Two properties carry the whole design, and each is worth making concrete on its own.

Anti-replay. The report is bound to a fresh, single-use nonce the verifier issued. A report captured from a genuine GPU cannot be replayed later, nor reused to satisfy a different challenge, because the verifier consumes each nonce exactly once and rejects anything that does not echo an outstanding one. Freshness and a valid signature are both required.

# Runnable on system python3 (numpy + stdlib). Core idea: nonce freshness defeats replay.
# A recorded ("captured") attestation report from a genuine GPU must NOT be accepted a
# second time, nor accepted by a different challenge, because each challenge binds a fresh
# random nonce and checks that the report echoes exactly that nonce, exactly once.
import hashlib
import numpy as np


class Verifier:
    """Issues single-use nonces and accepts a report only if it echoes an unspent one."""

    def __init__(self, rng: np.random.Generator) -> None:
        self._rng = rng
        self._outstanding: set[bytes] = set()   # issued, not yet consumed

    def issue_nonce(self) -> bytes:
        n = self._rng.integers(0, 256, size=16, dtype=np.uint8).tobytes()  # 128-bit
        self._outstanding.add(n)
        return n

    def accept(self, report_nonce: bytes, sig_ok: bool) -> bool:
        # Fail closed: signature must verify AND the nonce must be one we issued and have
        # not yet consumed. Consuming it makes replay of the same report impossible.
        if not sig_ok or report_nonce not in self._outstanding:
            return False
        self._outstanding.discard(report_nonce)
        return True


DEVICE_KEY = b"nvidia-device-identity-key"     # only the genuine GPU holds this


def sign_report(nonce: bytes, signing_key: bytes) -> bytes:
    """A GPU signs (measurements || nonce) with its device-identity key."""
    return hashlib.sha256(signing_key + b"|MEAS|" + nonce).digest()


def report(nonce: bytes, signing_key: bytes) -> tuple[bytes, bool]:
    """sig_ok is True iff the signature verifies against the trusted DEVICE_KEY root."""
    sig = sign_report(nonce, signing_key)
    verifies = (sig == sign_report(nonce, DEVICE_KEY))   # chains to the identity root?
    return nonce, verifies


def genuine_report(nonce: bytes, signing_key: bytes = DEVICE_KEY) -> tuple[bytes, bool]:
    return report(nonce, signing_key)


rng = np.random.default_rng(1)

# --- Round 1: genuine, fresh challenge -> accepted exactly once. ---
v = Verifier(rng)
n1 = v.issue_nonce()
r1_nonce, r1_sig = genuine_report(n1)
assert v.accept(r1_nonce, r1_sig) is True, "fresh genuine report must be accepted"

# 1. Replay: the SAME captured report presented again must be rejected (nonce consumed).
assert v.accept(r1_nonce, r1_sig) is False, "replayed report must be rejected"

# 2. Cross-challenge replay: a report bound to an OLD nonce cannot satisfy a NEW challenge.
n2 = v.issue_nonce()
assert v.accept(n1, r1_sig) is False, "old-nonce report must not satisfy a new challenge"
r2_nonce, r2_sig = genuine_report(n2)          # a genuinely fresh report for n2 is accepted
assert v.accept(r2_nonce, r2_sig) is True

# 3. Adversarial: an attacker who never saw a valid nonce guesses one. With a 128-bit
#    space the guess is not outstanding, so it is rejected. Model many forgery attempts.
v2 = Verifier(np.random.default_rng(7))
_ = v2.issue_nonce()                            # one outstanding nonce, unknown to attacker
forged_hits = 0
for _ in range(100_000):
    guess = rng.integers(0, 256, size=16, dtype=np.uint8).tobytes()
    if v2.accept(guess, True):                  # even WITH a valid signature flag
        forged_hits += 1
assert forged_hits == 0, f"blind nonce forgery must fail; got {forged_hits} hits"

# 4. Signature is keyed, not a tautology: a report signed with the WRONG key (a host that
#    lacks the device-identity key) does not chain to the root, so sig_ok is False.
_, imposter_sig = report(v.issue_nonce(), b"imposter-key")
assert imposter_sig is False, "report signed without the device key must not verify"

# 5. Equivalence / boundary: freshness alone is not enough; a fresh nonce with a BAD
#    signature must still be rejected (both conditions are load-bearing).
v3 = Verifier(np.random.default_rng(9))
n3 = v3.issue_nonce()
assert v3.accept(n3, False) is False, "fresh nonce but invalid signature must be rejected"

print("nonce_freshness OK:", "replay + cross-challenge + 100k forgeries + bad-sig boundary all rejected")

Authenticated link. Every byte crossing the CPU-to-GPU boundary is encrypted and authenticated with AES-GCM (through bounce buffers on Hopper, natively on Blackwell TEE-I/O). The confidentiality half means a host reading the bus learns nothing; the integrity half means the host cannot silently flip a byte in flight, because the authentication tag detects any tamper. The block below models both halves of that AEAD guarantee with numpy and stdlib, and proves that a single-bit tamper, a forged tag, or a reused nonce all fail authentication.

# Runnable on system python3 (numpy + stdlib). Core math of the confidential CPU<->GPU
# link: authenticated encryption (AES-GCM in the real bounce buffer / TEE-I/O). We model
# the two guarantees it provides with numpy: (a) CONFIDENTIALITY -- ciphertext leaks
# nothing without the key; (b) INTEGRITY -- any tamper on the wire is DETECTED, so the
# host cannot silently flip a byte crossing PCIe/NVLink.
import hashlib
import hmac
import numpy as np


def keystream(key: bytes, nonce: bytes, n: int) -> np.ndarray:
    """Deterministic pseudo-random keystream (stand-in for the AES-CTR core), as uint8."""
    out = bytearray()
    counter = 0
    while len(out) < n:
        out += hashlib.sha256(key + b"|KS|" + nonce + counter.to_bytes(8, "big")).digest()
        counter += 1
    return np.frombuffer(bytes(out[:n]), dtype=np.uint8)


def seal(key: bytes, nonce: bytes, plaintext: np.ndarray) -> tuple[np.ndarray, bytes]:
    """Encrypt-then-MAC: XOR with keystream, then authenticate ciphertext (GCM-shaped)."""
    assert plaintext.dtype == np.uint8 and plaintext.ndim == 1
    ct = np.bitwise_xor(plaintext, keystream(key, nonce, plaintext.size))
    tag = hmac.new(key, nonce + ct.tobytes(), hashlib.sha256).digest()   # auth tag
    return ct, tag


def open_(key: bytes, nonce: bytes, ct: np.ndarray, tag: bytes) -> np.ndarray:
    """Verify tag FIRST, then decrypt. Reject (raise) on any authentication failure."""
    expected = hmac.new(key, nonce + ct.tobytes(), hashlib.sha256).digest()
    if not hmac.compare_digest(expected, tag):
        raise ValueError("authentication failed: ciphertext or nonce was tampered")
    return np.bitwise_xor(ct, keystream(key, nonce, ct.size))


rng = np.random.default_rng(2)
key = rng.integers(0, 256, size=32, dtype=np.uint8).tobytes()     # 256-bit link key
nonce = rng.integers(0, 256, size=12, dtype=np.uint8).tobytes()   # 96-bit GCM nonce
weights = rng.integers(0, 256, size=4096, dtype=np.uint8)         # a page of model weights

ct, tag = seal(key, nonce, weights)

# 1. Round-trip correctness: an untampered page decrypts back to the exact plaintext.
assert np.array_equal(open_(key, nonce, ct, tag), weights)

# 2. Confidentiality: ciphertext must not equal plaintext and must be statistically
#    unrelated to it (a host reading the bus learns nothing without the key).
assert not np.array_equal(ct, weights)
corr = np.corrcoef(ct.astype(float), weights.astype(float))[0, 1]   # near zero
assert abs(corr) < 0.1, corr

# 3. Adversarial / corruption detection (the load-bearing property): flip ANY single bit
#    of ciphertext and authentication MUST fail -- a silent host tamper is impossible.
for i in range(0, ct.size, 137):               # sample bit-flips across the buffer
    bad = ct.copy()
    bad[i] ^= np.uint8(0x01)
    raised = False
    try:
        open_(key, nonce, bad, tag)
    except ValueError:
        raised = True
    assert raised, f"tampered byte {i} was NOT detected"

# 4. Adversarial: tag forgery under a WRONG key must fail (attacker cannot fabricate a
#    valid tag without the link key), and a truncated/zero tag must fail too.
wrong_key = (np.frombuffer(key, np.uint8) ^ np.uint8(0xFF)).tobytes()
_, forged = seal(wrong_key, nonce, weights)
for bogus in (forged, b"\x00" * 32):
    raised = False
    try:
        open_(key, nonce, ct, bogus)
    except ValueError:
        raised = True
    assert raised, "forged/empty tag must be rejected"

# 5. Boundary: replaying a valid (ct, tag) under a DIFFERENT nonce must fail, because the
#    tag binds the nonce -- this is why GCM nonces must never repeat per key.
other_nonce = (np.frombuffer(nonce, np.uint8) ^ np.uint8(0x01)).tobytes()
raised = False
try:
    open_(key, other_nonce, ct, tag)
except ValueError:
    raised = True
assert raised, "nonce-swap must break authentication"

print("authenticated_link OK:", f"round-trip + confidentiality(corr={corr:+.3f}) + bit-flip/forgery/nonce-swap all rejected")

How to use it¶

The snippets in this and the following sections are unexecuted reference templates. CC flags, driver versions, and the attestation SDK move quickly; confirm every command against the current NVIDIA Confidential Computing deployment guide and the nvtrust repo for your driver before relying on it. Pin versions; never assume a flag string.

CC requires the whole chain to be confidential-capable: a CPU with SEV-SNP or TDX, a hypervisor and guest kernel that support launching a CVM, an NVIDIA driver built for CC, and the GPU placed in a confidential mode. The GPU is passed through to the CVM; the host never sees the guest's plaintext.

# Reference template. On the host: confirm the GPU reports confidential-compute
# capability and read its mode. Exact subcommands vary by driver.
nvidia-smi conf-compute --help
nvidia-smi conf-compute -f          # query current CC mode (Off / On / DevTools)

The GPU exposes (broadly) three modes: CC-Off (no protection), CC-On (full protection, for production), and a DevTools mode that relaxes some protections so profilers and debuggers work while you bring a workload up. Use DevTools only for development; it is not a confidential state.

How to integrate it¶

Confidentiality without attestation is theatre. An attacker who controls the host could present a non-confidential GPU and you would never know. Attestation is the load-bearing integration step, and the flow, end to end, is:

The CVM opens an SPDM session to the GPU and requests an attestation report (signed measurements of GPU firmware/microcode plus a nonce for freshness).
A verifier checks the report: signature chains to NVIDIA's device-identity root, and measurements match NVIDIA's published RIM for that firmware. Use NRAS (cloud) or run a local verifier from nvtrust for air-gapped/sovereign setups.
Only on a passing verdict does a KMS / key broker release the workload's decryption keys into the attested CVM. Keys are bound to the attestation result, so they never land on an unattested or tampered node.

The attest_verdict gate above is exactly step 2's decision; the nonce model above is exactly step 1's anti-replay property. Wire the real client from the nvtrust SDK, and keep the release contract identical to the validated logic: never release secrets unless the verifier passes against fresh measurements.

# Reference template (needs the NVIDIA attestation SDK / nvtrust; not executed here).
# The contract: never release secrets unless verify() passes against fresh measurements.
from nv_attestation_sdk import attestation  # package/name per the nvtrust release you pin

client = attestation.Attestation()
client.set_name("workload-x")
client.set_nonce(fresh_random_nonce())                 # anti-replay
client.add_verifier(attestation.Devices.GPU,
                    attestation.Environment.REMOTE,    # or LOCAL for air-gapped
                    nras_url, "")
ok = client.attest()                                   # SPDM report -> RIM check
if not ok:
    raise SystemExit("GPU attestation FAILED, do not release keys")
# release_keys_into_cvm()  # only reached on a genuine, untampered, attested GPU

How to run it in production¶

Inside the attested CVM the application is unmodified; PyTorch/CUDA see a normal GPU. The performance picture depends on the architecture and the workload's transfer profile:

Hopper: overhead concentrates on CPU-to-GPU transfers, which are encrypted through bounce buffers, and on the loss of cross-GPU peer paths in confidential mode. Large, compute-bound models (where the transfer is a small fraction of runtime) see little overhead; small or transfer-heavy/latency-bound workloads pay more. Stage data to keep the GPU busy and minimize host round-trips.
Blackwell TEE-I/O: encrypts the interfaces directly and extends the TEE across NVLink, so multi-GPU confidential workloads run at near-identical throughput to unencrypted mode.⁴

Benchmark your workload with CC on versus off; do not assume the headline number. Treat attestation as a runtime gate, not a one-time check: re-attest on every key release and on workload (re)start, and wire attestation failures into observability and monitoring as a security signal.

How to maintain it¶

Track which firmware RIM your fleet expects so a driver/VBIOS/GSP update does not silently break attestation. A legitimate upgrade changes the measured values, so stage the firmware and its RIM together and roll the expected-RIM policy at the same time you roll the image. Keep the KMS/key-broker policy ("release only to this measurement") under change control alongside the rest of your secrets story, and version it so a rollback of firmware is matched by a rollback of the accepted RIM. The attest_verdict truth table above is the invariant to preserve across every such change: any single mismatch must still fail closed.

How to scale it¶

Scaling confidential workloads is where the architecture choice bites, because the two generations scale differently:

Hopper confines the TEE to a single GPU across the PCIe boundary, so multi-GPU confidential work loses the cross-GPU peer (NVLink) fast paths and funnels through per-GPU bounce buffers. Scale-out is possible but transfer-bound work pays for it; keep per-GPU problems compute-bound and minimize host round-trips.
Blackwell TEE-I/O extends the TEE across the NVLink/NVSwitch fabric and encrypts the interfaces directly, so a confidential workload spans many GPUs at near-line-rate. That is the qualitative reason to prefer Blackwell when CC and scale-out are both required (security-multitenancy, nvswitch-nvlink).⁴

CC composes with MIG on Hopper and newer, but the supported combinations have evolved across drivers, so verify MIG+CC for your exact driver rather than assuming. When you need both partitioned multi-tenancy and confidentiality, pin the driver and confirm the matrix before committing a fleet layout.

Failure modes¶

CC enabled, attestation skipped. The workload runs "confidential" but no one verified the GPU is genuine; a host could front an unprotected device. Confidentiality without a passing attestation gate is not a guarantee.
Keys released before verification. Secrets reach the node first and attestation is checked after (or never). Bind key release to the attestation verdict, always. The attest_verdict gate exists to make this impossible when used correctly.
Stale RIM after a firmware update. A legitimate driver/VBIOS/GSP upgrade changes measurements; verification fails fleet-wide until the expected RIM is updated. Stage firmware and RIM together.
Reused or predictable nonce. A static challenge lets a captured report be replayed, and a reused GCM nonce breaks the authenticated link (see the nonce-swap and replay cases above). Issue a fresh single-use nonce every round.
DevTools mode left on in production. Protections are relaxed for debugging; the workload is not actually confidential.
Assuming consumer/Ampere GPUs can do CC. They cannot. Plaintext weights sit in HBM, readable by the host (security-multitenancy failure modes).
Hopper transfer-bound workload under CC. Bounce-buffer encryption dominates a chatty host-to-GPU loop; profile and restructure I/O, or move to Blackwell TEE-I/O.

Open questions & validation¶

End-to-end attestation flow on your stack: SPDM session, report, RIM match, key release, exercised against a deliberately tampered/spoofed node to confirm it fails closed.
Measured CC-on versus CC-off throughput for your real model, on Hopper and (if available) Blackwell, not the datasheet claim.
Local-verifier (air-gapped) attestation parity with NRAS for sovereign deployments.
MIG + CC support matrix on your pinned driver.
Whether a renter actually needs full CC or only attestation; match the mechanism to the threat.

References¶

NVIDIA Confidential Computing (solution overview, Hopper TEE, deployment): https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/
NVIDIA nvtrust (attestation SDK, local GPU verifier, deployment guides): https://github.com/NVIDIA/nvtrust
NVIDIA H100 (built-in confidential computing): https://www.nvidia.com/en-us/data-center/h100/
NVIDIA Blackwell architecture (first TEE-I/O capable GPU): https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/
DMTF SPDM (Security Protocol and Data Model, secure sessions and measurements): https://www.dmtf.org/standards/SPDM
AMD SEV-SNP (CPU CVM): https://www.amd.com/en/developer/sev.html
Intel TDX (CPU CVM): https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html

NVIDIA Confidential Computing, solution overview: https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/ ↩↩
NVIDIA nvtrust (attestation SDK and local GPU verifier): https://github.com/NVIDIA/nvtrust ↩
NVIDIA H100 (first NVIDIA GPU with built-in confidential computing): https://www.nvidia.com/en-us/data-center/h100/ ↩
NVIDIA Blackwell architecture (first TEE-I/O capable GPU, TEE across NVLink at near-identical throughput): https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/ ↩↩↩
DMTF SPDM (Security Protocol and Data Model): https://www.dmtf.org/standards/SPDM ↩
AMD SEV-SNP (CPU confidential VM): https://www.amd.com/en/developer/sev.html ↩
Intel TDX (CPU confidential VM): https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html ↩