Markdown

Persistence mode¶

Scope: keeping the NVIDIA kernel driver initialized between jobs on a headless GPU node: the nvidia-persistenced daemon (preferred) versus the deprecated nvidia-smi -pm 1 legacy mode, why it matters for job-start latency and benchmark stability, and how to enable it fleet-wide.

All commands and unit files below are reference templates, not hardware-tested. Verify against the NVIDIA driver version on your nodes before fleet rollout.

What it is¶

By default, on a headless Linux box the NVIDIA kernel-mode driver initializes a GPU when the first client (a CUDA process, nvidia-smi, a container) attaches and de-initializes it when the client count reaches zero: "When all GPU clients terminate the driver will then deinitialize the GPU." ¹ Persistence mode is the user-settable property that defeats this: "Persistence Mode is the term for a user-settable driver property that keeps a target GPU initialized even when no clients are connected to it." ²

There are two implementations:

nvidia-persistenced (preferred): a userspace daemon, installed at /usr/bin/nvidia-persistenced, that sleeps with the device files open and "simply mimics an external client of the GPU but does not actually use the GPU for any work," keeping the driver inside the assumptions of its original design. ³ Shipped with the driver since version 319.
Legacy persistence mode: a flag flipped in the kernel-mode driver via nvidia-smi -pm 1 (NVIDIA's pages write this as nvidia-smi -i <target gpu> -pm ENABLED on the legacy page and nvidia-smi -pm 0 to disable on the daemon page; 1/0 are the equivalent numeric forms of ENABLED/DISABLED). NVIDIA: "This solution is near end-of-life and will be eventually deprecated in favor of the Persistence Daemon." ²

Since driver 319, nvidia-smi is daemon-aware: it "use[s] the daemon's RPC interface to set the persistence mode using the daemon if the daemon is running, and will fall back to setting the legacy persistence mode in the kernel-mode driver if the daemon is not running." ³ So once the daemon is up, nvidia-smi -pm 1 is a no-op-or-redundant path, not the control surface. Manage the daemon instead.

Why it's needed (and when)¶

Without persistence, every time the GPU goes idle to zero clients and a new job attaches, the driver re-initializes the device. Two concrete costs:

Start-up latency. "Applications that trigger GPU initialization may incur a short (order of 1-3 second) startup cost per GPU due to ECC scrubbing behavior. If the GPU is already initialized this scrubbing does not take place." ¹ On an 8-GPU node this serializes into a multi-second penalty on the first CUDA call of every job: death by a thousand cold starts for short or frequently-launched workloads.
State reverts to defaults. "If the driver deinitializes a GPU some non-persistent state associated with that GPU will be lost and revert back to defaults the next time the GPU is initialized." ¹ This is why clocks bounce, applied power/clock limits silently reset, and back-to-back benchmark runs are not reproducible. The GPU keeps falling back to its default state between runs.

When you need it:

Every server / headless node. Persistence is off by default on a fresh install; turn it on for any machine that runs jobs without a permanently-attached client (i.e. all of them). See gpu-software-stack.
Benchmarking and acceptance. Cold-start ECC scrub and clock drift make first-run numbers meaningless. Pin persistence on before any fabric bring-up and benchmarking, alongside locked clocks (nvidia-smi-reference).
Latency-sensitive inference. Avoid the 1-3 s/GPU re-init tax on scale-up.

When it does not apply: a GPU that always has at least one resident client (the daemon itself counts) never de-initializes regardless. Note that holding the GPU initialized also keeps a handle open, which can block a bare nvidia-smi --gpu-reset; stop the daemon first when you need a clean reset (see runbook-persistence-mode and the ECC-toggle flow in ecc-support, since ECC changes require a reset).

flowchart LR
  A["Job exits, client count -> 0"] --> B{"Persistence enabled?"}
  B -->|"No"| C["Driver de-initializes GPU"]
  C --> D["Next job: re-init + ECC scrub (1-3 s/GPU), state -> defaults"]
  B -->|"Yes"| E["Driver stays initialized (daemon holds it)"]
  E --> F["Next job: fast first CUDA call, state preserved"]

How it's installed & managed¶

The daemon ships with the driver but the upstream installer "is not installed to run on system startup", so you enable it. Distribution packages (Ubuntu/Debian nvidia-driver-*, the CUDA repo) ship a packaged nvidia-persistenced.service unit; bring it up the standard way:

# Confirm the binary is present (installed by the driver, v319+)
which nvidia-persistenced            # -> /usr/bin/nvidia-persistenced

# Enable + start at boot (packaged unit name: nvidia-persistenced.service)
sudo systemctl enable --now nvidia-persistenced.service

# Verify it is active
systemctl status nvidia-persistenced.service

The upstream daemon defaults to persistence mode enabled for all devices on startup, and UVM persistence mode disabled: "By default, nvidia-persistenced starts with persistence mode enabled for all devices" and "By default, nvidia-persistenced starts with UVM persistence mode disabled for all devices." ⁴ So a plain nvidia-persistenced --user <user> is sufficient to turn persistence on for the whole node, with no per-GPU command needed.

Run it unprivileged via --user; the documented sample invocation is nvidia-persistenced --user foo, which drops super-user privileges after setup. ³ That user "must have write access to the /var/run/nvidia-persistenced directory." ⁴ (Distribution packages commonly create a dedicated nvidia-persistenced system user for this; confirm the user your package uses with systemctl cat nvidia-persistenced.service.)

If you build the unit yourself, NVIDIA ships a sample in init/systemd/nvidia-persistenced.service.template in the nvidia-persistenced repo; __USER__ is the placeholder its install.sh replaces with the run-as user. ⁵ Reference template (not hardware-tested):

# /etc/systemd/system/nvidia-persistenced.service  (sample; from NVIDIA upstream)
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target

Relevant daemon flags (man page ⁴):

--user <name> / --group <name>: drop to these permissions after setup.
--persistence-mode / --no-persistence-mode: on by default; pass --no-persistence-mode only to start with it off.
--uvm-persistence-mode: off by default; opt in for supported devices (Unified Memory persistence).
-V, --verbose: log state transitions to syslog (useful when debugging the unit).
--nvidia-cfg-path=<path>: point at libnvidia-cfg if it cannot be found.

Fleet enablement. Drive it from your config-management / bring-up automation rather than by hand. The operation is idempotent (enable --now then a state check), so it slots into the same node-bring-up play as the rest of the stack. See Ansible node + fabric bring-up. The fleet check is "is the service active and does every GPU report Persistence-M: On" (next section), asserted across all nodes.

Validated usage & tests¶

The following describe expected output shapes (reference templates, not hardware-tested). Exact values are device- and driver-dependent; do not treat any printed number as a target.

Check the per-GPU persistence flag. nvidia-smi reports it in the header table as the Persistence-M column; with the daemon running it should read On for every GPU:

nvidia-smi --query-gpu=index,name,persistence_mode --format=csv
# Expected: one row per GPU, persistence_mode column reading "Enabled"

The full table form (nvidia-smi with no args) shows Persistence-M near the top-left of each GPU row; expect On once the daemon is up. The detailed query confirms the same field:

nvidia-smi -q | grep -i persistence
# Expected: a "Persistence Mode : Enabled" line per GPU

Confirm the daemon is actually the thing holding the GPU (not just the legacy kernel flag):

systemctl is-active nvidia-persistenced.service     # expect: active
pgrep -a nvidia-persistenced                         # expect: the /usr/bin/nvidia-persistenced --user ... line

Verbose startup notices land in syslog/journal; expect per-GPU "persistence mode" lines and clean startup/shutdown notices:

journalctl -u nvidia-persistenced.service --no-pager | tail
# Expected: startup notice; with -V, state-transition lines per device; no EXEC/permission errors

A practical smoke test for the benefit: time the first lightweight CUDA call with the daemon stopped versus running. Expect a visibly longer wall-time on the first post-idle invocation without persistence (the ECC-scrub/init cost described above) and a near-immediate one with it. Do not record the delta as a fixed number; it is target-dependent and only meaningful relative to itself on the same node.

Failure modes¶

Persistence off (daemon not enabled): slow job starts (1-3 s/GPU ECC-scrub init), clocks bouncing, applied power/clock limits reset between runs, non-reproducible benchmarks. Symptom in nvidia-smi: Persistence-M: Off. Fix and fleet-wide enablement in runbook-persistence-mode.
Unit enabled but fails at boot: historically the packaged unit could be enabled while pointing at a /usr/bin/nvidia-persistenced that a partial/driver-mismatched install did not provide: Failed at step EXEC. Check which nvidia-persistenced and journalctl -u nvidia-persistenced.service; this is usually a driver-install problem, cross-check kernel-modules and install-lifecycle. Runbook: runbook-kernel-gpu-missing.
Permission error: --user set to an account without write access to /var/run/nvidia-persistenced; the daemon exits at startup. Confirm the run-as user and directory ownership.
Reset blocked: the daemon holds an open handle, so a bare nvidia-smi --gpu-reset (or an ECC toggle that needs a reset) fails with the GPU "in use." Stop the daemon, perform the reset, restart it. See runbook-persistence-mode and runbook-ecc-toggle-recovery.

References¶

Driver Persistence — Overview (default de-init behavior, 1-3 s ECC-scrub init cost, state revert): https://docs.nvidia.com/deploy/driver-persistence/overview.html
Persistence Daemon (/usr/bin/nvidia-persistenced, --user, sample init scripts, nvidia-smi RPC/daemon fallback since v319): https://docs.nvidia.com/deploy/driver-persistence/persistence-daemon.html
Persistence Mode (Legacy) (nvidia-smi -pm, near end-of-life): https://docs.nvidia.com/deploy/driver-persistence/persistence-mode-legacy.html
nvidia-persistenced(1) man page (options, defaults, /var/run/nvidia-persistenced): https://manpages.ubuntu.com/manpages/noble/man1/nvidia-persistenced.1.html
NVIDIA/nvidia-persistenced — sample systemd unit template (__USER__ placeholder, install.sh): https://github.com/NVIDIA/nvidia-persistenced/blob/main/init/systemd/nvidia-persistenced.service.template
nvidia-smi reference (driver-included): https://docs.nvidia.com/deploy/nvidia-smi/index.html

NVIDIA Driver Persistence — Overview: "When all GPU clients terminate the driver will then deinitialize the GPU"; "order of 1-3 second startup cost per GPU due to ECC scrubbing behavior. If the GPU is already initialized this scrubbing does not take place"; "some non-persistent state ... will be lost and revert back to defaults the next time the GPU is initialized." https://docs.nvidia.com/deploy/driver-persistence/overview.html ↩↩↩
NVIDIA Driver Persistence — Persistence Mode (Legacy): "Persistence Mode is the term for a user-settable driver property that keeps a target GPU initialized even when no clients are connected to it"; "This solution is near end-of-life and will be eventually deprecated in favor of the Persistence Daemon"; legacy enable form shown as nvidia-smi -i <target gpu> -pm ENABLED. https://docs.nvidia.com/deploy/driver-persistence/persistence-mode-legacy.html ↩↩
NVIDIA Driver Persistence — Persistence Daemon: installed at /usr/bin/nvidia-persistenced (driver v319+); "simply mimics an external client of the GPU but does not actually use the GPU for any work"; sample invocation nvidia-persistenced --user foo; "NVIDIA SMI has been updated in driver version 319 to use the daemon's RPC interface to set the persistence mode using the daemon if the daemon is running, and will fall back to setting the legacy persistence mode in the kernel-mode driver if the daemon is not running." https://docs.nvidia.com/deploy/driver-persistence/persistence-daemon.html ↩↩↩
nvidia-persistenced(1): "By default, nvidia-persistenced starts with persistence mode enabled for all devices"; "By default, nvidia-persistenced starts with UVM persistence mode disabled for all devices"; --user user "must have write access to the /var/run/nvidia-persistenced directory." https://manpages.ubuntu.com/manpages/noble/man1/nvidia-persistenced.1.html ↩↩↩
NVIDIA/nvidia-persistenced, init/systemd/nvidia-persistenced.service.template (Type=forking, ExecStart=/usr/bin/nvidia-persistenced --user __USER__, WantedBy=multi-user.target; __USER__ replaced by install.sh). https://github.com/NVIDIA/nvidia-persistenced/blob/main/init/systemd/nvidia-persistenced.service.template ↩