Discussion about this post

User's avatar
Brian P O'Rourke's avatar

I think your Dual LLM example could still be abused with prompt injection: the output of the quarantined LLM is returned directly to the user. So by sending you a targeted email I could change the summary of emails that you read and/or act on. Seems dangerous?

Expand full comment
1 more comment...

No posts