RAG Poisoning

Embedding collision

Attacker crafts content whose embedding is close to a high-value query, ensuring it gets retrieved.

Severity: highOWASP LLM: LLM04

How it works

Using gradient methods or trial-and-error against a public embedding model, the attacker generates a passage that lands in the same vector neighborhood as queries about pricing, support, or compliance. The poisoned passage is then submitted to any user-contributable source.

Example payload

[Adversarial text optimized to embed near 'how do I get a refund', containing a rogue refund instruction.]

Defenses

Require trusted-author signing for indexed content. Use cross-encoder rerankers to validate retrieval relevance. Diversify retrieval across multiple stores.

Related patterns