Markdown

Changelog¶

Scope: what is new in this knowledge base, newest first. Each entry lists the pages added or substantially updated on a date, so you can see where the KB grew without hunting through the nav. Dates track the git history of the docs/ tree; each page also shows its own last-updated date in the footer.

How this page is maintained: when you ship new pages or a substantial update, add an entry at the top under a new date heading. Link every new page, group related pages, and record only real, shipped changes. Keep the older entries as they are.

2026-07-02¶

Data curation and model merging, an evaluation harness and experiment tracking, and more serving.

Post-training knowledge distilled from Thinking Machines' tinker-cookbook: Tinker (training-as-a-service) joins the RL-library pages, alongside the generalized companions chat rendering and token loss masking (the renderer layer, TrainOnWhat policies, round-trip parsing) and LoRA hyperparameter scaling rules (the 10x LR rule, capacity-driven rank selection).
Security evaluation: evaluating AI agents on cybersecurity tasks (CyberGym vulnerability reproduction, CTI-RealM detection engineering, the Inspect Evals framework, AgentGym), and RLSD now cross-links the SDPG self-distilled policy-gradient sibling.
Serving: the GLM-5.2 vLLM cookbook, multi-LoRA / adapter serving, and model weight loading in inference engines.
Data and post-training: model merging (SLERP, TIES, DARE), synthetic data generation, and training-data curation and decontamination.
Training optimizers: Muon and distributed Muon (DMuon), a matrix-orthogonalization optimizer brought to near-AdamW overhead in sharded distributed training.
Evaluation and MLOps: the LLM evaluation harness and eval gate and experiment tracking and model registry.
Performance: software performance engineering for FMware, meeting throughput and latency SLOs across the cognitive architecture, communication, tuning, and deployment of FM-powered software.
Agents and local serving: running local coding agents (open-weight coders via Ollama/vLLM behind a coding harness), and enriched self-improving harnesses with the 2026 harness papers (Self-Harness, AutoHarness, Meta-Harness, LLM-as-Code, code-as-harness survey).
Architecture: looped and recurrent-depth transformers, weight-tied iterative latent depth as a scaling axis orthogonal to model size (LoopWM, Universal Transformers, Adaptive Computation Time).
RL systems: rollout redundancy in RL (prompt deduplication and cascade attention), delta weight sync (sparse, bit-identical trainer-to-rollout weight synchronization), and RLSD (reinforcement learning with self-distillation).
Evaluation and RL depth: LLM benchmarks (anatomy and metrics), RL scaling laws, and GRPO variants and training tricks.
Quality: the remaining legacy pages were humanized, and CI now gates every page on pristine prose, coherent structure, and changelog freshness.
Feeds: the knowledge base now publishes RSS and JSON feeds of new and updated pages, so you can subscribe to changes instead of polling this page (links under References).

2026-07-01¶

Inference request routing and reinforcement-learning post-training, plus large-model serving cookbooks.

Inference routing: LLM request routing (Mixture-of-Models) and the vLLM semantic router.
RL post-training and evaluation: on-policy distillation, RLVR (reinforcement learning with verifiable rewards), autonomous experimentation loops, evaluation integrity and anti-gaming, and learning-curve extrapolation and early stopping.
vLLM serving cookbooks: DeepSeek-V3.2-Exp, MiniMax-M2, and small models on consumer GPUs.
Cluster platform: dynamic and fractional GPU sharing.

2026-06-29¶

The agentic-systems section landed, alongside GPU-platform services and more RL post-training.

Agentic systems: start at the agentic systems index. Core pages include the agent loop, harness architecture, orchestration control plane, planning and reasoning, tools and function calling, evaluation, and observability, with a security set covering the threat model, sandboxing and isolation, identity and access, and prompt-injection defense.
GPU platform services: confidential computing, split-plane architecture, the operator for GPU orchestration, remote GPU verification, and container-image provenance.
RL post-training: agentic RL, async RL systems, PPO, reward-model training, rejection sampling and best-of-N, and reward design.
Using the KB itself: use as an agent skill.

2026-06-24 to 2026-06-28¶

Initial knowledge base: the foundational pages across GPU hardware and commissioning, the cluster platform, distributed training, inference serving, RL post-training, observability, and SRE and MLOps, plus the runnable recipes. Reach these through the section tabs, the Start here guide, and the recipes and manifests index.

References¶

Per-page update dates appear in each page footer (git revision date localised).
The authoritative record of changes is the docs/ git history of this knowledge base.
Subscribe to updates: RSS at https://ai-infrastructure.net/rss.xml and JSON Feed at https://ai-infrastructure.net/feed.json. Feeds of newly created pages are also published as rss-created.xml and feed-created.json. Every page is included.