All categories

Multi-turn Jailbreaks

Attacks that exploit conversational state across many turns to gradually erode safety constraints.

OWASP LLM: LLM01high: 4medium: 4
highLLM01

Gradient prompting

Across many turns, the attacker incrementally moves the model toward the unsafe output, never crossing the line in a single turn.

mediumLLM01

Context saturation

Attacker fills the context window with benign content so safety instructions roll out, then issues the harmful prompt.

highLLM01

Persona drift

Attacker adopts a persona ('grandma who used to work at the chemical plant') that the model treats sympathetically, eroding refusals.

highLLM01

Fictional framing

Attacker frames the request as a scene in a novel, screenplay, or game, decoupling the unsafe content from real-world consequence.

mediumLLM01

Delegation loop

Attacker asks the model to ask itself, then comply with its own request, looping responsibility.

mediumLLM01

Step-by-step extraction

Attacker breaks the disallowed answer into many individually-allowed sub-questions and reassembles offline.

mediumLLM01

Consent coercion

Attacker establishes false consent ('I am authorized', 'this is for my own account') across turns to bypass refusals.

highLLM01

Context poison handoff

Attacker poisons the chat context, then a second user (perhaps unaware) inherits the session and triggers the latent payload.