⚠️ This post links to an external website. ⚠️
Two significant papers on LLM security, Agents Rule of Two and The Attacker Moves Second, reveal urgent vulnerabilities surrounding prompt injection. The Agents Rule of Two proposes a model where an AI agent should not accept more than two of three properties within a session—processing untrustworthy inputs, accessing sensitive data, and changing state. This practical framework addresses the inherent risks while highlighting that current defenses against prompt injections remain inadequate and unreliable. On the other hand, The Attacker Moves Second demonstrates that twelve defenses against prompt injection can be defeated through adaptive attacks, achieving an attack success rate above 90%. The findings underscore the necessity of robust systems in an unpredictable AI landscape, urging a reevaluation of defense strategies. Overall, these publications advocate for deeper security measures in LLM-powered applications.
continue reading onsimonwillison.net
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.