The Dual LLM pattern for building AI…

Apr 25, 2023

My attempt at describing a way to build an AI assistant securely in a world where prompt injection has yet to be comprehensively solved

Read →

2 Comments

Brian P O'Rourke

Apr 26, 2023

I think your Dual LLM example could still be abused with prompt injection: the output of the quarantined LLM is returned directly to the user. So by sending you a targeted email I could change the summary of emails that you read and/or act on. Seems dangerous?

Expand full comment

Reply (1)

Simon Willison

Apr 26, 2023

That's entirely true: the activities of the quarantined LLM remain susceptible to prompt injection, so things like summaries could be corrupted by an attack.

I don't have a fix for that. I've pretty much decided I'll have to live with it if I want to have an AI-driven system that can summarize text for me.

You can reduce the amount of damage it can do if you take it into account - for example, summarize each email as a separate task rather than bundling multiple emails together into a single task that could be corrupted by the content of just one of them.

Expand full comment

Simon Willison’s Newsletter

The Dual LLM pattern for building AI…