2 Comments
User's avatar
sunshines and rains's avatar

Prompt injection prevention/mitigation for the big players is one thing. What could a lesser organisations that trains/tunes their own models do? They are unlikely to have the same skills or data to do the same.

I can't imagine something like anti-virus / anti-bias (for a given definition of bias) for LLM-training data.

Is there a playbook, well established patterns for detecting / mitigating things like obfuscated payloads, SEO-hacks, and the hundreds of other data-dirtying techhniques that exist?

I'm thinking of state-level attackers incrementally distributing/injecting polluted data sources with a view to affecting LLM models trained specifically for gov purposes. I'm thinking about other countries that might not have access to US three-letter-agency level brains.

Expand full comment
T Stands For's avatar

Super insightful article—I thought you brought a lot of clarity to this topic. One small suggestion: maybe use "extraction" or "data heisting" instead of "exfiltration." In AI security contexts, "exfiltration" often refers to model theft, which could confuse newcomers about what's actually being targeted during these prompt injection attacks.

Expand full comment