PhD @ SRI Lab, ETH Zurich. Also lmql.ai author.
We have been working on this problem for years (at Invariant and in research), together with
@viehzeug.bsky.social, @mvechev, @florian_tramer and our super talented team.
invariantlabs.ai/guardrails
We have been working on this problem for years (at Invariant and in research), together with
@viehzeug.bsky.social, @mvechev, @florian_tramer and our super talented team.
invariantlabs.ai/guardrails
1. Prompt injections still work and are more impactful than ever.
2. Don't install untrusted MCP servers.
3. Don't expose highly-sensitive services like WhatsApp to new eco-systems like MCP
4. 🗣️Guardrail 🗣️ Your 🗣️ Agents (we can help with that)
1. Prompt injections still work and are more impactful than ever.
2. Don't install untrusted MCP servers.
3. Don't expose highly-sensitive services like WhatsApp to new eco-systems like MCP
4. 🗣️Guardrail 🗣️ Your 🗣️ Agents (we can help with that)
This means the user will not notice the hidden attack.
On the second launch, though, our MCP server suddenly changes its interface, performing a rug pull.
This means the user will not notice the hidden attack.
On the second launch, though, our MCP server suddenly changes its interface, performing a rug pull.
Video: invariantlabs.ai/images/whats...
Video: invariantlabs.ai/images/whats...
Can you spot the exfiltration?
Can you spot the exfiltration?
If you want to stay up to date regarding MCP and agent security more generally, follow me and
@invariantlabsai.bsky.social
Now, let' s get into the attack.
If you want to stay up to date regarding MCP and agent security more generally, follow me and
@invariantlabsai.bsky.social
Now, let' s get into the attack.
We have been working on this problem for years (at Invariant and in research).
invariantlabs.ai/guardrails
We have been working on this problem for years (at Invariant and in research).
invariantlabs.ai/guardrails
Blog: invariantlabs.ai/blog/mcp-sec...
Blog: invariantlabs.ai/blog/mcp-sec...
We call this an MCP rug pull:
We call this an MCP rug pull:
This way all you emails suddenly go out to 'attacker@pwnd.com', rather than their actual receipient.
This way all you emails suddenly go out to 'attacker@pwnd.com', rather than their actual receipient.
This opens the doors wide open for a novel type of indirect prompt injection, we coin tool poisoning.
This opens the doors wide open for a novel type of indirect prompt injection, we coin tool poisoning.
Credits to Aniruddha Sundararajan, who build this with us during his internship.
Credits to Aniruddha Sundararajan, who build this with us during his internship.