5 Comments
User's avatar
PEG's avatar

Thoughtful piece, but I wonder if there’s a category mismatch. Your framework assumes agents have persistent goals that can be delegated and constrained through policy. But if LLMs are stateless pattern matchers where “goals” are just temporary statistical biases from context (as the injective proof suggests), what exactly is being constrained?

The attacks you’re defending against exploit accumulated context across sessions, not weak authorization boundaries. No component-level policy can see breaches that emerge from individually compliant interactions.

Curious how your framework accounts for this—are you assuming something like persistent intentions emerges during inference?

Phil Windley's avatar

I'm not sure I follow. The idea behind something like OpenClaw is that it makes a plan to accomplish something. That plan includes multiple steps, each of which might involve a tool call. Each of those tool calls can be the object of a policy decision. The results of those policy decisions are information that can be used to feed back into the plan (for replanning purposes).

I think you bring up an interesting point about context and how it plays into this.

PEG's avatar

I think we might be talking past each other. When OpenClaw “makes a plan,” it’s generating tokens that look like plans. When it “replans,” it’s generating different tokens because context changed. There’s no persistent goal being maintained—just stateless predictions.

Your policy can check “is this API call allowed?” but not “does this serve the delegated purpose?” Purpose doesn’t exist as checkable system state—it’s our interpretation of token patterns.

The risk isn’t unauthorized calls. It’s that authorized calls collectively construct a context that drifts the agent beyond your delegation intent. That drift happens in token generation between tool calls, where policy can’t see it.

I wrote more about this framing here if you’re interested: https://thepuzzleanditspieces.substack.com/p/the-agent-that-wasnt-there

Phil Windley's avatar

Thanks for the link and the thoughts. I'm not imagining that the authorization takes intent or the realization of the plan (tokens) into account. This is straightforward PBAC based on the principal (delegate), action (e.g., sendEmail), resource (e.g., email recipient), and context (including delegation parameters). Policy is evaluated and returns permit or deny. Permit allows the agent to take the action, deny, causes the agent to replan.

Daniel Hardman's avatar

Hey, Phil. Great post, and I agree that authorizing agents is tricky and needs to be done continuously. I also strongly agree with your other post that agents are not a good choice for a policy engine due to their probabilistic nature.

I wanted to mention several things that seem adjacent to what you're talking about here, that might pique your interest or feedback.

1. A supposition in many discussions about agents is that they have sameness across time and are thus worthy of identity that asserts that sameness. But if an agent changes its model or its training data or its policies, what is really the "same" about it from Time1 to Time2, such that it is worthy of having an identity that connects its existence at those two points? The only sameness in such a model might be the controller's access point, or possibly the agent's stored data used for RAG, or possibly the OAuth tokens it uses to authenticate to external systems. I'm worried that this issue is handwaved past by the designers of A2A and MCP, and represents a fundamental trust gap that can never be filled unless/until we figure out how to associate sameness with an agent in a reliable way. (This may be partly what PEG is alluding to with "stateless pattern matchers... just temporary statistical biases from context", or maybe that's a whole nuther point...)

2. I think the entire digital landscape, but ESPECIALLY agent land, is desperately in need of a concept that I call an "intent boundary", which I have written about here: https://dhh1128.github.io/papers/intent-boundaries.html

3. I 100% agree that the constraints you want to place on agents are mandatory, and I think there should be many more. I did some work on constrained delegation a couple years ago, and it is now becoming relevant to stuff I'm working on in verifiable voice (telco). It could also be applied to agents. See https://github.com/provenant-dev/public-schema/blob/main/gcd/index.md (this doc is KERI-oriented, but same principles could be applied more widely). Implicit in this is that KERI delegation is 2-way (has builtin mechanisms for the delegator holding the delegate accountable and transparent), whereas normal delegation is 1-way (transfer authority and then have no ability to monitor, revoke, prevent its redelegation, etc.).