2 Comments

I think your Dual LLM example could still be abused with prompt injection: the output of the quarantined LLM is returned directly to the user. So by sending you a targeted email I could change the summary of emails that you read and/or act on. Seems dangerous?

Expand full comment