Markdown

Agent policy engine¶

Scope: the authorization decision for a single agent action, "may this run?", enforced deny-by-default by a small, analysable policy engine placed in the framework's in-process tool hook. This is the authorize step of the control plane decide() chain, made its own enforceable and provable layer. It pairs with identity (whose authority) and intent (does it match the request).

Policies and code here are reference templates; pin versions and validate before relying on them.

flowchart LR
  ACT["Proposed tool call"] --> HOOK["In-process pre-tool hook"]
  HOOK --> CEDAR["Policy engine (deny-by-default)"]
  CEDAR -->|"permit, no forbid matched"| RUN["Action runs"]
  CEDAR -->|"deny (named rule)"| REPLAN["Structured replan signal"]
  FORBID["forbid rules (irreversible / high-stakes)"] --> CEDAR

Overview¶

Detection tells you an input is hostile; policy tells you an action is forbidden. The second is what contains the damage, because excessive agency (OWASP LLM06) is stopped by deciding, per action, whether it is allowed at all. The decision is binary and deny-by-default: an action runs only if a rule permits it and no rule forbids it. The two recurring mistakes are putting that decision in the wrong place (a network sidecar) and giving it to the wrong actor (the model itself).¹

Core knowledge¶

Put the gate in the in-process hook¶

A policy engine behind a network sidecar returns a 403 the model cannot distinguish from a transient network error, so it retries or abandons the task and the deny reason is lost. The framework's in-process pre-tool hook is the right place, because three properties hold there: the proposed tool name and arguments are present in structured form, local session identity and context are available without serialization, and the hook's return value re-enters the loop as either permission or a structured, replannable failure. A well-placed hook also fires before the permission-mode check, so it cannot be disabled by a "skip permissions" flag.¹

A small, named, deny-by-default policy¶

Use a policy language whose rules are small and individually named rather than a sprawling rulebook. Cedar fits: each rule carries an identifier, so a denial returns a human-readable reason ("forbid_irreversible") instead of a bare refusal; the default with no matching permit is deny; and forbid rules override permit, which is how irreversible or high-value actions (a refund over a threshold, a destructive command) are hard-blocked regardless of other grants.² One policy file can drive a Rust harness, a TypeScript harness (via cedar-wasm), and a Python harness (via cedarpy) and evaluate identically across all three, so the same rules govern every runtime an agent uses.

Prove the policy, do not just review it¶

A policy is itself code that can be wrong. Because Cedar is analysable, a policy set can be compiled to SMT and checked by a solver for properties like "this policy never permits an action outside the intended set" before it ships. That turns "a reviewer read it and it looked fine" into a proof, and it runs in CI as a gate on policy changes.² This is the policy-side analogue of holding the substrate fixed when evaluating a harness change.

A denial is a replan signal¶

Because the gate runs in-process and returns a named reason, a denial is information the model can act on: it sees "this tool is forbidden for this resource" and chooses a different path, rather than looping on a hard no or treating it as an outage. This is why the authorize step sits first in the control plane decide() chain: a clean deny short-circuits the rest before any cost is spent.

Don't-miss checklist¶

Authorize every tool call and sub-agent spawn in the in-process hook, deny-by-default.
Name every rule so denials carry a reason the model can replan against.
Use forbid rules for irreversible and high-value actions; they must override any permit.
Drive every runtime from one policy file so behaviour cannot diverge between harnesses.
Prove the policy is not over-permissive (SMT check) in CI before shipping a change.

Failure modes¶

Sidecar enforcement. A network 403 reads as an outage; the model retries or quits instead of replanning.
Model self-authorizes. Authorization is left to the model in the prompt; a prompt injection lifts it.
Allow-by-default. A missing rule permits the action; the blast radius is whatever was not explicitly denied.
Unnamed denials. A bare refusal gives the model nothing to replan on; the loop stalls.
Unproven policy. An over-permissive rule ships unnoticed until it is exploited.
Per-runtime drift. Separate policies for each harness diverge; an action denied in one is allowed in another.

Open questions & validation¶

SMT analysis covers the policy, not the runtime that calls it; validate the hook is actually on every path.
Policy completeness is hard to assert; test the deny-by-default boundary with adversarial actions.
Cedar's expressiveness has limits; confirm the rules you need are statically analysable.

References¶

Cedar policy language: https://www.cedarpolicy.com/
Open Policy Agent: https://www.openpolicyagent.org/
OWASP Top 10 for LLM Applications (LLM06 Excessive Agency): https://genai.owasp.org/llm-top-10/

Authorization belongs in the in-process pre-tool hook, where the tool name and arguments, local identity, and a replannable return value are all available; a network-sidecar deny is indistinguishable from an outage, and a well-placed hook fires before the permission-mode check so it cannot be disabled by a skip-permissions flag. ↩↩
Cedar policies are small, individually named (so denials carry reasons), and deny-by-default with forbid overriding permit; one policy file drives Rust, TypeScript (cedar-wasm), and Python (cedarpy) harnesses identically, and the policy set can be compiled to SMT and checked for over-permissiveness before shipping. ↩↩