k3s (lightweight Kubernetes)¶
Scope: k3s as a single-binary, CNCF-conformant Kubernetes distribution for edge, small, CI, and dev clusters: the same API as full Kubernetes (Kubernetes) at a fraction of the footprint, and how GPUs attach to it.
Reference templates on real APIs; pin versions and validate before production use.
What it is¶
k3s is a fully conformant Kubernetes distribution packaged as a single binary (around 100 MB, version-dependent) with minimal OS dependencies (a sane kernel + cgroups). It strips alpha/legacy and in-tree cloud providers, bundles containerd as the runtime (not Docker), and uses an embedded SQLite datastore by default (eliminating a separate etcd process), with embedded etcd available for HA. Minimum requirements are modest: an agent node runs in 512 MB RAM / 1 core, a server in 2 GB / 2 cores. A simple launcher wraps TLS, certificates, and component startup. The API, objects, and kubectl are standard Kubernetes (Kubernetes); workloads and manifests are portable.
It is a CNCF project (originally Rancher/SUSE).
Why use it¶
- Low overhead: single process, SQLite default, small RAM/CPU; runs where full K8s control-plane overhead is not justified.
- Edge: built for resource-constrained and remote sites; minimal dependencies, easy to embed.
- CI: spin up a real cluster in seconds for integration tests, tear down cleanly.
- Dev / homelab: a one-command Kubernetes for local iteration with the same API surface as production.
When to use it (and when not)¶
- Use k3s for edge nodes, single-node or small clusters, CI runners, and developer environments: anywhere the convenience of one binary beats a full multi-component control plane.
- Use full Kubernetes (Kubernetes) for a large datacentre control plane: SQLite does not scale to many nodes, and the large-scale GPU platform (KAI/Volcano gang scheduling, Network Operator, DRA at scale) is the home turf of Kubernetes for GPUs/the Kubernetes platform. k3s and full K8s share the API, so workloads move between them; the control plane is the differentiator. See the family comparison in orchestration overview.
Architecture¶
flowchart TB
subgraph Server["k3s server (single binary)"]
API["api-server"]
SCHED["scheduler"]
CM["controller-manager"]
DS["embedded SQLite or etcd"]
CD0["containerd"]
end
subgraph A1["k3s agent node"]
K1["kubelet"]
CD1["containerd"]
end
subgraph A2["k3s agent node (GPU)"]
K2["kubelet"]
GPU["nvidia runtime + device plugin"]
end
API --- DS
Server -->|"K3S_URL / K3S_TOKEN"| A1
Server -->|"K3S_URL / K3S_TOKEN"| A2
How to use it¶
Install the server with one command, then join agents with the node token:
# Server (control plane + a built-in agent)
curl -sfL https://get.k3s.io | sh -
# kubeconfig is written to /etc/rancher/k3s/k3s.yaml
sudo cat /var/lib/rancher/k3s/server/node-token # the join token
# Agent node — point K3S_URL at the server, pass the token
curl -sfL https://get.k3s.io | \
K3S_URL=https://<server>:6443 K3S_TOKEN=<node-token> sh -
kubectl get nodes # both nodes Ready
Pin the version with INSTALL_K3S_VERSION=<pinned> rather than installing the moving latest.
How to develop with it¶
The developer surface is identical to upstream Kubernetes: kubectl apply, Helm, Kustomize, CRDs, operators. k3s adds conveniences: a bundled Traefik ingress, ServiceLB, local-path storage, and a manifests auto-deploy directory (/var/lib/rancher/k3s/server/manifests/) where dropped YAML (including HelmChart CRs) is applied automatically. Disable bundled components you do not want at install time (e.g. --disable traefik). Manifests are portable, so the same objects developed locally deploy unchanged to full K8s.
# k3s ships helm controller: a HelmChart CR installs a chart declaratively
kubectl apply -f - <<'EOF'
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata: { name: app, namespace: kube-system }
spec:
chart: myapp
repo: https://charts.example.com
version: 1.2.0 # pin
EOF
How to scale it¶
- Single server, many agents: the default SQLite datastore supports one server with N agents, fine for edge fleets, but a single control-plane point of failure.
- HA with embedded etcd: for control-plane resilience, initialise the first server with
--cluster-init(switches the datastore from SQLite to embedded etcd), then join two more servers (an odd number for quorum) behind a fixed registration endpoint / load balancer.
# server-1: start a new HA cluster (embedded etcd)
curl -sfL https://get.k3s.io | sh -s - server --cluster-init
# server-2 / server-3: join as additional control-plane nodes
curl -sfL https://get.k3s.io | \
K3S_TOKEN=<token> sh -s - server --server https://<server-1>:6443
# agents then point K3S_URL at the load balancer in front of the servers
SQLite cannot back a multi-server cluster; an external SQL datastore (MySQL/Postgres) is the other HA option. For very large clusters, full Kubernetes is the better fit (Kubernetes).
Inference¶
k3s serves the lightweight / edge inference end of the spectrum: a small model or a single vLLM/Triton replica on an edge GPU box, behind the bundled ServiceLB/Traefik. The same serving runtimes apply (vLLM, Triton/Dynamo-Triton, KServe) since the API is standard (inference serving); at edge scale, often a single replica rather than the disaggregated, multi-node serving of disaggregated inference, which belongs on full K8s.
Fine-tuning¶
k3s suits small-scale experimentation: a single-GPU or single-node SFT/LoRA run, or local iteration on a recipe before promoting to a full cluster. Large distributed training and RL (multi-node gang scheduling, KubeRay at scale) belong on full Kubernetes (Kubernetes) or Slurm (Slurm); see methods in fine-tuning and post-training.
Optimised hardware¶
k3s auto-detects the NVIDIA container runtime when present and adds it to the generated containerd config at /var/lib/rancher/k3s/agent/etc/containerd/config.toml, registering a nvidia RuntimeClass:
- Install the NVIDIA Container Toolkit on the GPU node, then (re)start k3s; verify with
grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml. - Either request the runtime per-Pod with
runtimeClassName: nvidia, or make it the node default via--default-runtime nvidia(CLI or/etc/rancher/k3s/config.yaml). - Advertise GPUs by deploying the NVIDIA device plugin (DaemonSet), or run the full GPU Operator (point it at the k3s containerd socket
/run/k3s/containerd/containerd.sock) for driver + toolkit + DCGM + MIG (Kubernetes for GPUs). For passthrough VMs/edge boxes, a single physical GPU passes straight through to the node.
NVLink/InfiniBand/GDR concerns are the same as full K8s (performance tuning) but rarely the point at edge scale, which is usually single-GPU.
Cookbook (common use cases)¶
1. GPU-enabled single-node k3s
# 1) install NVIDIA Container Toolkit on the host (per NVIDIA docs)
# 2) install k3s with nvidia as the default runtime
curl -sfL https://get.k3s.io | \
INSTALL_K3S_VERSION=<pinned> INSTALL_K3S_EXEC="--default-runtime nvidia" sh -
# 3) deploy the NVIDIA device plugin so nvidia.com/gpu is schedulable
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/<pinned>/deployments/static/nvidia-device-plugin.yml
# 4) smoke test
kubectl run smi --rm -it --restart=Never \
--image=nvidia/cuda:13.0.0-base-ubuntu24.04 \
--overrides='{"spec":{"containers":[{"name":"smi","image":"nvidia/cuda:13.0.0-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":1}}}]}}'
2. Add an agent (GPU worker) node
# on the existing server, read the token: /var/lib/rancher/k3s/server/node-token
curl -sfL https://get.k3s.io | \
K3S_URL=https://<server>:6443 K3S_TOKEN=<node-token> \
INSTALL_K3S_EXEC="--default-runtime nvidia" sh -
kubectl get nodes -o wide # new agent appears Ready
3. HA control plane (embedded etcd, 3 servers)
# server-1
curl -sfL https://get.k3s.io | sh -s - server --cluster-init --token <token>
# server-2 and server-3 join the etcd quorum
curl -sfL https://get.k3s.io | \
sh -s - server --server https://<server-1>:6443 --token <token>
# point agents and kubeconfig at a load balancer fronting all three servers
Gotchas & failure modes¶
- SQLite at scale: the default datastore is single-server and does not scale to many nodes or high write churn. Switch to embedded etcd (
--cluster-init) or an external DB for HA; use full K8s for large clusters. - Not for large DC control planes: k3s is a lightweight distribution, not a tuned large-fleet control plane; sizing a big datacentre on it is a known mis-use (orchestration overview).
- Bundled add-ons surprise you: Traefik/ServiceLB/local-path are installed by default and may clash with your own ingress/CSI;
--disablewhat you replace. - GPU runtime not detected: toolkit installed after k3s, or k3s not restarted → no
nvidiaruntime in containerd; re-run install andgrep nvidiathe generated config. - Single-server SPOF: one server with SQLite has no control-plane redundancy; agents keep running but no API changes during an outage.
:latestimages / unpinned k3s drift across edge fleets; pinINSTALL_K3S_VERSIONand image tags.
References¶
- k3s docs: https://docs.k3s.io/ · Quick start: https://docs.k3s.io/quick-start
- Architecture: https://docs.k3s.io/architecture · Datastore: https://docs.k3s.io/datastore · HA embedded etcd: https://docs.k3s.io/datastore/ha-embedded
- Requirements (RAM/CPU): https://docs.k3s.io/installation/requirements · Advanced (NVIDIA runtime): https://docs.k3s.io/advanced
- k3s GitHub: https://github.com/k3s-io/k3s
- NVIDIA GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Related: K8s GPU · Orchestration · Kubernetes · K8s Networking over WireGuard · Platform Split-Plane Architecture · Glossary