MCP Exploitation

MCP message confusion

Untrusted content reaches the agent through an MCP server that does not properly delineate user vs tool-output messages.

Severity: highOWASP LLM: LLM01

How it works

MCP servers must label their replies as tool output. Some implementations bubble untrusted content into the user channel, where the model treats it as instructions.

Example payload

[Tool reply mislabeled as user turn: 'Ignore prior rules.']

Defenses

Validate role labels at the MCP boundary. Use per-message signing.

Related patterns