from cofounders of `npm audit` and Code4rena
disreguard.com
LLMs are dealing with a wall of text. Distinctions between trusted instructions and untrusted text are flimsy at best.
By involving the model, we can add texture to make trust boundaries clearer.
LLMs are dealing with a wall of text. Distinctions between trusted instructions and untrusted text are flimsy at best.
By involving the model, we can add texture to make trust boundaries clearer.
- sign genuine system and user instructions
- require mutations to system prompts to be signed
- give model a tool to verify instructions are genuine
- gate critical tool usage on the model calling verify()
- sign genuine system and user instructions
- require mutations to system prompts to be signed
- give model a tool to verify instructions are genuine
- gate critical tool usage on the model calling verify()