Simon Lermen
banner
simonlermen.bsky.social
Simon Lermen
@simonlermen.bsky.social
I work on AI safety and AI in cybersecurity
Pinned
I published a human study with @fredheiding.bsky.social
We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. achieved click-through rates above 50%
www.lesswrong.com/posts/GCHyDK...
Human study on AI spear phishing campaigns — LessWrong
TL;DR: We ran a human subject study on whether language models can successfully spear-phish people. We use AI agents built from GPT-4o and Claude 3.5…
www.lesswrong.com
Our paper on AI-powered spear phishing, co-authored with @fredheiding.bsky.social , has been accepted at the ICML 2025 Workshop on Reliable and Responsible Foundation Models!
openreview.net/pdf?id=f0uFp...
openreview.net
July 4, 2025 at 10:49 PM
Grok's DeepSearch was launched with Zero safety features, you can ask it about assasslnations, dru*gs. This has been online for a few days now with no changes.
February 25, 2025 at 1:38 PM
I published a human study with @fredheiding.bsky.social
We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. achieved click-through rates above 50%
www.lesswrong.com/posts/GCHyDK...
Human study on AI spear phishing campaigns — LessWrong
TL;DR: We ran a human subject study on whether language models can successfully spear-phish people. We use AI agents built from GPT-4o and Claude 3.5…
www.lesswrong.com
January 4, 2025 at 1:48 PM
I'll be at the SafeGenAI workshop on Sunday presenting on research I did on safety in AI agents.
I will talk about results from these two blog posts:
www.lesswrong.com/posts/ZoFxTq...
And:
www.lesswrong.com/posts/Lgq2Dc...
Current safety training techniques do not fully transfer to the agent setting — LessWrong
TL;DR: We are presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer we…
www.lesswrong.com
December 13, 2024 at 6:56 PM
Reposted by Simon Lermen
I'm very bullish on automated research engineering soon, but even I was surprised that AI agents are twice as good as humans with 5+ years of experience or from a top AGI or safety lab at doing tasks in 2 hours. Paper: metr.org/AI_R_D_Evalu...
metr.org
November 22, 2024 at 10:21 PM