from cofounders of `npm audit` and Code4rena
disreguard.com
LLMs are dealing with a wall of text. Distinctions between trusted instructions and untrusted text are flimsy at best.
By involving the model, we can add texture to make trust boundaries clearer.
LLMs are dealing with a wall of text. Distinctions between trusted instructions and untrusted text are flimsy at best.
By involving the model, we can add texture to make trust boundaries clearer.
- sign genuine system and user instructions
- require mutations to system prompts to be signed
- give model a tool to verify instructions are genuine
- gate critical tool usage on the model calling verify()
- sign genuine system and user instructions
- require mutations to system prompts to be signed
- give model a tool to verify instructions are genuine
- gate critical tool usage on the model calling verify()
And if we do, can we make sure it uses it?
Yes, and yes.
disreguard.com/blog/posts/s...
And if we do, can we make sure it uses it?
Yes, and yes.
disreguard.com/blog/posts/s...
@disreguard.com is a security research lab focused on tools and methods for ergonomic approaches for the rigorous defense-in-depth needed for agents.
disreguard.com/blog/posts/p...
@disreguard.com is a security research lab focused on tools and methods for ergonomic approaches for the rigorous defense-in-depth needed for agents.
disreguard.com/blog/posts/p...