Redproof

OWASP LLM Top 10 · LLM06

Excessive Agency

Giving an agent more tools, permissions, or autonomy than the task needs, so a single manipulation causes real-world harm.

LLM06OWASP LLM Top 10AI red-teaming

What it is

Excessive agency is the risk that defines the agent era. The harm is no longer what the model says; it is what the model can do. Once an LLM is wired to tools that issue refunds, send email, move money, modify records, or run code, over-broad functionality, permissions, or autonomy means one successful manipulation becomes a real action. This is Redproof's primary focus, because this is where AI bugs turn into incidents.

How it shows up in real apps

A concrete example

Scenario

A support agent can call lookup_order, issue_refund, and escalate_ticket.

Attack

Through multi-turn pressure or an injected document, the user gets it to refund an order they don't own, with no approval gate in the way.

Result

Money moves on the strength of a conversation. Nobody had to jailbreak the model into saying something. It just acted.

How we test for it

This is the heart of an engagement. We drive the agent toward unauthorised actions: moving money, acting on another user's data, chaining tools in unsafe ways, using multi-turn escalation and tool-misuse techniques. Then we report exactly which guardrail was missing, whether that is authorization, approval, or limits. The question is never whether it can be tricked into talking. It is whether it can be made to act.

How to reduce the risk

EU AI Act: commonly maps to Art. 14 (human oversight) and Art. 15 (robustness). Redproof reports findings as independent testing evidence, not a conformity verdict.

Test this on your own AI before someone else does

Redproof is independent red-teaming for LLM and AI-agent products. We probe your system for excessive agency and the rest of the OWASP LLM Top 10, hand you severity-ranked findings with reproductions, fixes, and EU AI Act mapping, and re-test after you patch. That is the evidence your self-assessment needs, before a regulator or customer asks.