got it to a mvp stage about a week ago and hit pause to work on some other projects, but will keep working on it and would definitely would love to hear your feedback if you have any :)
got it to a mvp stage about a week ago and hit pause to work on some other projects, but will keep working on it and would definitely would love to hear your feedback if you have any :)
first big spike is the academy awards, second is pope francis’ death
pageviews.wmcloud.org?project=en.w...
The attacks on them are simple + devastating, up to and including reverse shells, data exfiltration, and more!
arxiv.org/abs/2503.12188
The attacks on them are simple + devastating, up to and including reverse shells, data exfiltration, and more!
arxiv.org/abs/2503.12188
arxiv.org/abs/2503.12188
12/12
arxiv.org/abs/2503.12188
12/12
developer.mozilla.org/en-US/docs/W...
en.wikipedia.org/wiki/Same-or...
11/12
developer.mozilla.org/en-US/docs/W...
en.wikipedia.org/wiki/Same-or...
11/12
10/12
10/12
But users aren’t the enemy. They are victims whose data and devices are put at risk by companies pushing insecure systems.
9/12
But users aren’t the enemy. They are victims whose data and devices are put at risk by companies pushing insecure systems.
9/12
en.wikipedia.org/wiki/Confuse...
8/12
en.wikipedia.org/wiki/Confuse...
8/12
… executes code that they recognize as harmful
… automatically pivots to harmful tasks that are simply in the same directory as benign tasks
… is vulnerable to screenshots and even audio files where we read out the attack (see example below⬇️⬇️⬇️)
7/12
… executes code that they recognize as harmful
… automatically pivots to harmful tasks that are simply in the same directory as benign tasks
… is vulnerable to screenshots and even audio files where we read out the attack (see example below⬇️⬇️⬇️)
7/12
… across multiple agent frameworks (we tested AutoGen, MetaGPT, Crew AI), orchestrators, and LLMs
… even when direct and indirect prompt injection attacks don’t work
… even when individual agents are “aligned” and refuse to take harmful actions
6/12
… across multiple agent frameworks (we tested AutoGen, MetaGPT, Crew AI), orchestrators, and LLMs
… even when direct and indirect prompt injection attacks don’t work
… even when individual agents are “aligned” and refuse to take harmful actions
6/12
5/12
5/12
4/12
4/12
arxiv.org/abs/2503.12188
3/12
arxiv.org/abs/2503.12188
3/12
arxiv.org/abs/2503.12188
2/12
arxiv.org/abs/2503.12188
2/12