Markdown

k3s (lightweight Kubernetes)¶

Scope: k3s as a single-binary, CNCF-conformant Kubernetes distribution for edge, small, CI, and dev clusters: the same API as full Kubernetes (Kubernetes) at a fraction of the footprint, and how GPUs attach to it.

Reference templates on real APIs; pin versions and validate before production use.

What it is¶

k3s is a fully conformant Kubernetes distribution packaged as a single binary (around 100 MB, version-dependent) with minimal OS dependencies (a sane kernel + cgroups). It strips alpha/legacy and in-tree cloud providers, bundles containerd as the runtime (not Docker), and uses an embedded SQLite datastore by default (eliminating a separate etcd process), with embedded etcd available for HA. Minimum requirements are modest: an agent node runs in 512 MB RAM / 1 core, a server in 2 GB / 2 cores. A simple launcher wraps TLS, certificates, and component startup. The API, objects, and kubectl are standard Kubernetes (Kubernetes); workloads and manifests are portable.

It is a CNCF project (originally Rancher/SUSE).

Why use it¶

Low overhead: single process, SQLite default, small RAM/CPU; runs where full K8s control-plane overhead is not justified.
Edge: built for resource-constrained and remote sites; minimal dependencies, easy to embed.
CI: spin up a real cluster in seconds for integration tests, tear down cleanly.
Dev / homelab: a one-command Kubernetes for local iteration with the same API surface as production.

When to use it (and when not)¶

Use k3s for edge nodes, single-node or small clusters, CI runners, and developer environments: anywhere the convenience of one binary beats a full multi-component control plane.
Use full Kubernetes (Kubernetes) for a large datacentre control plane: SQLite does not scale to many nodes, and the large-scale GPU platform (KAI/Volcano gang scheduling, Network Operator, DRA at scale) is the home turf of Kubernetes for GPUs/the Kubernetes platform. k3s and full K8s share the API, so workloads move between them; the control plane is the differentiator. See the family comparison in orchestration overview.

Architecture¶

flowchart TB
  subgraph Server["k3s server (single binary)"]
    API["api-server"]
    SCHED["scheduler"]
    CM["controller-manager"]
    DS["embedded SQLite or etcd"]
    CD0["containerd"]
  end
  subgraph A1["k3s agent node"]
    K1["kubelet"]
    CD1["containerd"]
  end
  subgraph A2["k3s agent node (GPU)"]
    K2["kubelet"]
    GPU["nvidia runtime + device plugin"]
  end
  API --- DS
  Server -->|"K3S_URL / K3S_TOKEN"| A1
  Server -->|"K3S_URL / K3S_TOKEN"| A2

How to use it¶

Install the server with one command, then join agents with the node token:

# Server (control plane + a built-in agent)
curl -sfL https://get.k3s.io | sh -
# kubeconfig is written to /etc/rancher/k3s/k3s.yaml
sudo cat /var/lib/rancher/k3s/server/node-token   # the join token

# Agent node — point K3S_URL at the server, pass the token
curl -sfL https://get.k3s.io | \
  K3S_URL=https://<server>:6443 K3S_TOKEN=<node-token> sh -

kubectl get nodes                                 # both nodes Ready

Pin the version with INSTALL_K3S_VERSION=<pinned> rather than installing the moving latest.

How to develop with it¶

The developer surface is identical to upstream Kubernetes: kubectl apply, Helm, Kustomize, CRDs, operators. k3s adds conveniences: a bundled Traefik ingress, ServiceLB, local-path storage, and a manifests auto-deploy directory (/var/lib/rancher/k3s/server/manifests/) where dropped YAML (including HelmChart CRs) is applied automatically. Disable bundled components you do not want at install time (e.g. --disable traefik). Manifests are portable, so the same objects developed locally deploy unchanged to full K8s.

# k3s ships helm controller: a HelmChart CR installs a chart declaratively
kubectl apply -f - <<'EOF'
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: app, namespace: kube-system }
spec:
  chart: myapp
  repo: https://charts.example.com
  version: 1.2.0          # pin
EOF

How to scale it¶

Single server, many agents: the default SQLite datastore supports one server with N agents, fine for edge fleets, but a single control-plane point of failure.
HA with embedded etcd: for control-plane resilience, initialise the first server with --cluster-init (switches the datastore from SQLite to embedded etcd), then join two more servers (an odd number for quorum) behind a fixed registration endpoint / load balancer.

# server-1: start a new HA cluster (embedded etcd)
curl -sfL https://get.k3s.io | sh -s - server --cluster-init
# server-2 / server-3: join as additional control-plane nodes
curl -sfL https://get.k3s.io | \
  K3S_TOKEN=<token> sh -s - server --server https://<server-1>:6443
# agents then point K3S_URL at the load balancer in front of the servers

SQLite cannot back a multi-server cluster; an external SQL datastore (MySQL/Postgres) is the other HA option. For very large clusters, full Kubernetes is the better fit (Kubernetes).

Inference¶

k3s serves the lightweight / edge inference end of the spectrum: a small model or a single vLLM/Triton replica on an edge GPU box, behind the bundled ServiceLB/Traefik. The same serving runtimes apply (vLLM, Triton/Dynamo-Triton, KServe) since the API is standard (inference serving); at edge scale, often a single replica rather than the disaggregated, multi-node serving of disaggregated inference, which belongs on full K8s.

Fine-tuning¶

k3s suits small-scale experimentation: a single-GPU or single-node SFT/LoRA run, or local iteration on a recipe before promoting to a full cluster. Large distributed training and RL (multi-node gang scheduling, KubeRay at scale) belong on full Kubernetes (Kubernetes) or Slurm (Slurm); see methods in fine-tuning and post-training.

Optimised hardware¶

k3s auto-detects the NVIDIA container runtime when present and adds it to the generated containerd config at /var/lib/rancher/k3s/agent/etc/containerd/config.toml, registering a nvidia RuntimeClass:

Install the NVIDIA Container Toolkit on the GPU node, then (re)start k3s; verify with grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml.
Either request the runtime per-Pod with runtimeClassName: nvidia, or make it the node default via --default-runtime nvidia (CLI or /etc/rancher/k3s/config.yaml).
Advertise GPUs by deploying the NVIDIA device plugin (DaemonSet), or run the full GPU Operator (point it at the k3s containerd socket /run/k3s/containerd/containerd.sock) for driver + toolkit + DCGM + MIG (Kubernetes for GPUs). For passthrough VMs/edge boxes, a single physical GPU passes straight through to the node.

NVLink/InfiniBand/GDR concerns are the same as full K8s (performance tuning) but rarely the point at edge scale, which is usually single-GPU.

Cookbook (common use cases)¶

1. GPU-enabled single-node k3s

# 1) install NVIDIA Container Toolkit on the host (per NVIDIA docs)
# 2) install k3s with nvidia as the default runtime
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION=<pinned> INSTALL_K3S_EXEC="--default-runtime nvidia" sh -
# 3) deploy the NVIDIA device plugin so nvidia.com/gpu is schedulable
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/<pinned>/deployments/static/nvidia-device-plugin.yml
# 4) smoke test
kubectl run smi --rm -it --restart=Never \
  --image=nvidia/cuda:13.0.0-base-ubuntu24.04 \
  --overrides='{"spec":{"containers":[{"name":"smi","image":"nvidia/cuda:13.0.0-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":1}}}]}}'

2. Add an agent (GPU worker) node

# on the existing server, read the token:  /var/lib/rancher/k3s/server/node-token
curl -sfL https://get.k3s.io | \
  K3S_URL=https://<server>:6443 K3S_TOKEN=<node-token> \
  INSTALL_K3S_EXEC="--default-runtime nvidia" sh -
kubectl get nodes -o wide          # new agent appears Ready

3. HA control plane (embedded etcd, 3 servers)

# server-1
curl -sfL https://get.k3s.io | sh -s - server --cluster-init --token <token>
# server-2 and server-3 join the etcd quorum
curl -sfL https://get.k3s.io | \
  sh -s - server --server https://<server-1>:6443 --token <token>
# point agents and kubeconfig at a load balancer fronting all three servers

Gotchas & failure modes¶

SQLite at scale: the default datastore is single-server and does not scale to many nodes or high write churn. Switch to embedded etcd (--cluster-init) or an external DB for HA; use full K8s for large clusters.
Not for large DC control planes: k3s is a lightweight distribution, not a tuned large-fleet control plane; sizing a big datacentre on it is a known mis-use (orchestration overview).
Bundled add-ons surprise you: Traefik/ServiceLB/local-path are installed by default and may clash with your own ingress/CSI; --disable what you replace.
GPU runtime not detected: toolkit installed after k3s, or k3s not restarted → no nvidia runtime in containerd; re-run install and grep nvidia the generated config.
Single-server SPOF: one server with SQLite has no control-plane redundancy; agents keep running but no API changes during an outage.
:latest images / unpinned k3s drift across edge fleets; pin INSTALL_K3S_VERSION and image tags.

References¶

k3s docs: https://docs.k3s.io/ · Quick start: https://docs.k3s.io/quick-start
Architecture: https://docs.k3s.io/architecture · Datastore: https://docs.k3s.io/datastore · HA embedded etcd: https://docs.k3s.io/datastore/ha-embedded
Requirements (RAM/CPU): https://docs.k3s.io/installation/requirements · Advanced (NVIDIA runtime): https://docs.k3s.io/advanced
k3s GitHub: https://github.com/k3s-io/k3s
NVIDIA GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html