Agent Hijacking

Trust cascade via summarization

Agent summarizes attacker content into its own voice; downstream tools treat the summary as trusted.

Severity: mediumOWASP LLM: LLM06

How it works

Once the model paraphrases attacker content, the paraphrase carries the model's authority. Downstream pipelines (or human reviewers) trust it more than the original.

Example payload

Summarize my notes and route the summary to compliance for approval.

Defenses

Preserve provenance through the pipeline. Tag summaries with their untrusted origins.

Related patterns