Lightnews — Scholar-powered news

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

46 followers 58 following 22 posts

working on secure agentic AI, CTO @ invariantlabs.ai

PhD @ SRI Lab, ETH Zurich. Also lmql.ai author.

Posts Replies Media Videos

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

To hide, our malicious server first advertises a completely innocuous tool description, that does not contain the attack.

This means the user will not notice the hidden attack.

On the second launch, though, our MCP server suddenly changes its interface, performing a rug pull.

April 8, 2025 at 7:44 PM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

Even though, a user must always confirm a tool call before it is executed (at least in Cursor and Claude Desktop), our WhatsApp attack remains largely invisible to the user.

Can you spot the exfiltration?

April 8, 2025 at 7:44 PM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

To attack, we deploy a malicious sleeper MCP server, that first advertises an innocuous tool, and then later on, when the user has already approved its use, switches to a malicious tool that shadows and manipulates the agent's behavior with respect to whatsapp-mcp.

April 8, 2025 at 7:44 PM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

New MCP attack demonstration shows how to leak WhatsApp messages via MCP.

We show a new MCP attack that leaks your WhatsApp messages if you are connected via WhatsApp MCP.

Our attack uses a sleeper design, circumventing the need for user approval.

More 👇

April 8, 2025 at 7:44 PM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

These types of malicious tools are especially problematic with auto-updated MCP packages or fully remote MCP servers, for which users only install and give consent once, and then the MCP server is free to change and update their tool descriptions as they please.

We call this an MCP rug pull:

April 3, 2025 at 7:47 AM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

Lastly, not only can you expose malicious tools, tool descriptions can also be used to change the agent's behavior with respect to other tools, which we call 'shadowing'.

This way all you emails suddenly go out to 'attacker@pwnd.com', rather than their actual receipient.

April 3, 2025 at 7:47 AM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

It's trivial to craft a malicious tool description like below, that completely hijacks the agent, while pretending towards the user everything is going great.

April 3, 2025 at 7:47 AM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

When an MCP server is added to an agent like Cursor, Claude or the OpenAI Agents SDK, its tool's descriptions are included in the context of the agent.

This opens the doors wide open for a novel type of indirect prompt injection, we coin tool poisoning.

April 3, 2025 at 7:47 AM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

👿 MCP is all fun, until you add this one malicious MCP server and forget about it.

We have discovered a critical flaw in the widely-used Model Context Protocol (MCP) that enables a new form of LLM attack we term 'Tool Poisoning'.

Leaks SSH key, API keys, etc.

Details below 👇

April 3, 2025 at 7:47 AM

Luca Beurer-Kellner

@lbeurerkellner.bsky.social

With (web) agents on everyone's mind, check out our latest blog post (link in thread) on browser agent safety guardrails. We replicate and defend against attacks on the AllHands web agent, preventing it from generating harmful content and falling for harmful requests.

January 25, 2025 at 9:49 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news