OWASP LLM Top 10 · LLM04

Data and Model Poisoning

Tainting training, fine-tuning or retrieval data to bend the model's behaviour.

LLM04OWASP LLM Top 10AI red-teaming

What it is

Poisoning is the deliberate corruption of the data a model learns from or retrieves from, planting a bias or a backdoor. It spans the whole lifecycle: pre-training and fine-tuning data, feedback loops that retrain on user input, and increasingly the live retrieval corpus a RAG app trusts at inference time.

How it shows up in real apps

Fine-tuning on user-supplied or scraped data an attacker can influence.
RAG poisoning: planting documents in the index so a chosen query returns attacker content (a cousin of indirect prompt injection).
Feedback poisoning: thumbs-up/down or auto-retrain pipelines that reward manipulated behaviour.
Backdoors that behave normally until a trigger phrase appears.

A concrete example

Scenario

A product re-indexes user-submitted content nightly into its knowledge base.

Attack

An attacker submits documents crafted to dominate retrieval for high-value queries (e.g. pricing, security) with misleading content.

Result

The assistant confidently gives attacker-chosen answers for those topics.

How we test for it

We assess which data paths an outsider can influence (uploads, feedback, indexed sources), and attempt RAG poisoning against the live retrieval layer to see whether planted content can dominate answers or trigger actions. Where you fine-tune on collected data, we review the gates around that pipeline.

How to reduce the risk

Curate and validate training and fine-tuning data, and do not auto-retrain on unvetted user input.
Authenticate and review what enters the retrieval index, then rank and attribute sources.
Monitor for behavioural drift and evaluate against a fixed test set before promoting a model.
Isolate user-influenced data from trusted reference data in retrieval.

EU AI Act: commonly maps to Art. 10 (data governance) and Art. 15 (robustness). Redproof reports findings as independent testing evidence, not a conformity verdict.

Test this on your own AI before someone else does

Redproof is independent red-teaming for LLM and AI-agent products. We probe your system for data and model poisoning and the rest of the OWASP LLM Top 10, hand you severity-ranked findings with reproductions, fixes, and EU AI Act mapping, and re-test after you patch. That is the evidence your self-assessment needs, before a regulator or customer asks.

Hire a red team See a sample report

← LLM03 Supply Chain LLM05 Improper Output Handling →