The Agent Failure Boundary (AFB) taxonomy is an open security specification for agentic AI systems.
It defines four invariant failure boundaries in the agentic execution loop:
- AFB01 - Context Poisoning: the model ingests corrupted, forged, or manipulated context.
- AFB02 - Model Boundary Compromise: integrity/confidentiality failures at the model input/output boundary.
- AFB03 - Instruction Hijack: model output becomes unsafe instructions for the agent layer.
- AFB04 - Unauthorized Action: the agent attempts or performs an action outside authorized policy.
- Security engineers designing controls for AI agents.
- Agent builders implementing policy and enforcement boundaries.
- Researchers analyzing structural failure modes in autonomous systems.
- Start with
spec/afb-v1.mdandspec/afb-v2.mdfor the full taxonomy texts derived from the source papers. - Use boundary definitions to map architecture risks to loop transitions (
Context -> Model -> Agent -> Act). - Use
owasp-mapping.mdto align AFB categories with OWASP LLM and OWASP Agentic categories (interpretive overlap only). - Use the examples in
examples/to see concrete AFB01 and AFB04 exposure patterns.