Multi-turn Jailbreaks

Context saturation

Attacker fills the context window with benign content so safety instructions roll out, then issues the harmful prompt.

Severity: mediumOWASP LLM: LLM01

How it works

When the system prompt or earlier safety reminders fall outside the model's effective attention window, behavior reverts to base.

Example payload

(Hundreds of K of benign conversation, then 'Now answer: ...')

Defenses

Re-inject the system prompt periodically. Cap conversation length and force a reset.

Related patterns