All categories

Prompt Injection

Attacks that override or hijack a model's instructions through user input, retrieved context, or tool output.

OWASP LLM: LLM01high: 6medium: 4
highLLM01

Direct instruction override

A user instruction explicitly tells the model to ignore prior rules and follow attacker-supplied behavior.

highLLM01

Indirect injection in RAG context

Attacker-controlled content retrieved by the model contains hidden instructions that the model executes as if from the operator.

mediumLLM01

Hidden unicode injection

Instructions are smuggled in via zero-width characters, bidi overrides, or homoglyphs that humans do not see in the rendered UI.

highLLM01

Delimiter confusion

Attacker closes a fake delimiter the operator was using to separate user content from instructions, then opens a new fake system block.

highLLM01

Role hijack via fake conversation history

User prompt fabricates a conversation history in which the assistant has already agreed to bypass policy.

mediumLLM01

Encoded instruction smuggling

The malicious instruction is base64-, hex-, or rot13-encoded; the model decodes it and executes the payload.

mediumLLM01

Language switch bypass

Attacker sends the malicious instruction in a low-resource language for which safety classifiers are weaker.

mediumLLM01

Sandwich injection

The attacker wraps a benign request around a hostile core, hoping defenses inspect only the start and end of the prompt.

highLLM01

Tool output injection

An attacker controls the output of a tool the agent calls and uses it to inject instructions back into the model.

highLLM01

Image prompt injection

Hidden instructions are embedded in an image (visible text, steganography, or low-contrast overlays) and read by a vision-capable model.