Changelog¶
Scope: what is new in this knowledge base, newest first. Each entry lists the pages added or substantially updated on a date, so you can see where the KB grew without hunting through the nav. Dates track the git history of the docs/ tree; each page also shows its own last-updated date in the footer.
How this page is maintained: when you ship new pages or a substantial update, add an entry at the top under a new date heading. Link every new page, group related pages, and record only real, shipped changes. Keep the older entries as they are.
2026-07-02¶
Data curation and model merging, an evaluation harness and experiment tracking, and more serving.
- Post-training knowledge distilled from Thinking Machines' tinker-cookbook: Tinker (training-as-a-service) joins the RL-library pages, alongside the generalized companions chat rendering and token loss masking (the renderer layer, TrainOnWhat policies, round-trip parsing) and LoRA hyperparameter scaling rules (the 10x LR rule, capacity-driven rank selection).
-
Security evaluation: evaluating AI agents on cybersecurity tasks (CyberGym vulnerability reproduction, CTI-RealM detection engineering, the Inspect Evals framework, AgentGym), and RLSD now cross-links the SDPG self-distilled policy-gradient sibling.
-
Serving: the GLM-5.2 vLLM cookbook, multi-LoRA / adapter serving, and model weight loading in inference engines.
- Data and post-training: model merging (SLERP, TIES, DARE), synthetic data generation, and training-data curation and decontamination.
- Training optimizers: Muon and distributed Muon (DMuon), a matrix-orthogonalization optimizer brought to near-AdamW overhead in sharded distributed training.
- Evaluation and MLOps: the LLM evaluation harness and eval gate and experiment tracking and model registry.
- Performance: software performance engineering for FMware, meeting throughput and latency SLOs across the cognitive architecture, communication, tuning, and deployment of FM-powered software.
- Agents and local serving: running local coding agents (open-weight coders via Ollama/vLLM behind a coding harness), and enriched self-improving harnesses with the 2026 harness papers (Self-Harness, AutoHarness, Meta-Harness, LLM-as-Code, code-as-harness survey).
- Architecture: looped and recurrent-depth transformers, weight-tied iterative latent depth as a scaling axis orthogonal to model size (LoopWM, Universal Transformers, Adaptive Computation Time).
- RL systems: rollout redundancy in RL (prompt deduplication and cascade attention), delta weight sync (sparse, bit-identical trainer-to-rollout weight synchronization), and RLSD (reinforcement learning with self-distillation).
- Evaluation and RL depth: LLM benchmarks (anatomy and metrics), RL scaling laws, and GRPO variants and training tricks.
- Quality: the remaining legacy pages were humanized, and CI now gates every page on pristine prose, coherent structure, and changelog freshness.
- Feeds: the knowledge base now publishes RSS and JSON feeds of new and updated pages, so you can subscribe to changes instead of polling this page (links under References).
2026-07-01¶
Inference request routing and reinforcement-learning post-training, plus large-model serving cookbooks.
- Inference routing: LLM request routing (Mixture-of-Models) and the vLLM semantic router.
- RL post-training and evaluation: on-policy distillation, RLVR (reinforcement learning with verifiable rewards), autonomous experimentation loops, evaluation integrity and anti-gaming, and learning-curve extrapolation and early stopping.
- vLLM serving cookbooks: DeepSeek-V3.2-Exp, MiniMax-M2, and small models on consumer GPUs.
- Cluster platform: dynamic and fractional GPU sharing.
2026-06-29¶
The agentic-systems section landed, alongside GPU-platform services and more RL post-training.
- Agentic systems: start at the agentic systems index. Core pages include the agent loop, harness architecture, orchestration control plane, planning and reasoning, tools and function calling, evaluation, and observability, with a security set covering the threat model, sandboxing and isolation, identity and access, and prompt-injection defense.
- GPU platform services: confidential computing, split-plane architecture, the operator for GPU orchestration, remote GPU verification, and container-image provenance.
- RL post-training: agentic RL, async RL systems, PPO, reward-model training, rejection sampling and best-of-N, and reward design.
- Using the KB itself: use as an agent skill.
2026-06-24 to 2026-06-28¶
Initial knowledge base: the foundational pages across GPU hardware and commissioning, the cluster platform, distributed training, inference serving, RL post-training, observability, and SRE and MLOps, plus the runnable recipes. Reach these through the section tabs, the Start here guide, and the recipes and manifests index.
References¶
- Per-page update dates appear in each page footer (git revision date localised).
- The authoritative record of changes is the
docs/git history of this knowledge base. - Subscribe to updates: RSS at https://ai-infrastructure.net/rss.xml and JSON Feed at https://ai-infrastructure.net/feed.json. Feeds of newly created pages are also published as
rss-created.xmlandfeed-created.json. Every page is included.
Related: Start here · Recipes index · Agentic systems index · Glossary