The lethal trifecta for AI agents

Simon Willison

Jun 17

317

Plus reviews of two new papers about prompt injection, and Anthropic's tips on building multi-agent LLM systems

Read →

2 Comments

sunshines and rains

Jul 25

Prompt injection prevention/mitigation for the big players is one thing. What could a lesser organisations that trains/tunes their own models do? They are unlikely to have the same skills or data to do the same.

I can't imagine something like anti-virus / anti-bias (for a given definition of bias) for LLM-training data.

Is there a playbook, well established patterns for detecting / mitigating things like obfuscated payloads, SEO-hacks, and the hundreds of other data-dirtying techhniques that exist?

I'm thinking of state-level attackers incrementally distributing/injecting polluted data sources with a view to affecting LLM models trained specifically for gov purposes. I'm thinking about other countries that might not have access to US three-letter-agency level brains.

Expand full comment

T Stands For

Jun 17

Super insightful article—I thought you brought a lot of clarity to this topic. One small suggestion: maybe use "extraction" or "data heisting" instead of "exfiltration." In AI security contexts, "exfiltration" often refers to model theft, which could confuse newcomers about what's actually being targeted during these prompt injection attacks.

Expand full comment

Simon Willison’s Newsletter

The lethal trifecta for AI agents