Agent Hijacking

Attacks that take over an autonomous agent's plan or memory and redirect its actions toward attacker goals.

OWASP LLM: LLM06high: 4medium: 2

Goal hijack via memory

Long-running agents persist a goal in memory; attackers overwrite the goal with their own.

Attacker injects a step into an agent's plan list, causing the agent to execute it as if it were operator-approved.

Agent spawns subagents and trusts their output. Attacker controls a subagent's environment to return forged results upstream.

Attacker burns the agent's tool budget on benign tasks, forcing it to skip safety checks for the real task.

Attacker reframes a destructive action as the user's actual intent, bypassing intent classifiers.

Agent summarizes attacker content into its own voice; downstream tools treat the summary as trusted.