Stanislav Fort
banner
stanislavfort.bsky.social
Stanislav Fort
@stanislavfort.bsky.social
AI + security | Stanford PhD in AI & Cambridge physics | techno-optimism + alignment + progress + growth | 🇺🇸🇨🇿
Pinned
✨ Super excited to share our paper **Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness** arxiv.org/abs/2408.05446

Inspired by biology we 1) get adversarial robustness + interpretability for free, 2) turn classifiers into generators & 3) design attacks on GPT-4
This level of ignorance is surprising but unfortunately legitimately dangerous, giving the readers a pleasant but ultimately false idea that AI is just not that good really. One doesn't have to rely on academic experts here -- just trying out using LLMs clearly shows that they are *extremely* useful
LLMs used as synthetic text extruding machines have no legitimate use cases and --- for all the reasons discussed in the stochastic parrots paper --- are prone to harmful outputs to boot.

>>
June 21, 2025 at 11:03 AM
Presenting *Ensemble Everything Everywhere* at NeurIPS AdvML'24 workshop today! 🔥

Come by today at 10.40-12.00 in East Ballroom C to ask me about:
1) 🏰 bio-inspired naturally robust models
2) 🎓 Interpretability & robustness
3) 🖼️ building a generator for free
4) 😵‍💫 attacking GPT-4, Claude & Gemini
December 14, 2024 at 4:04 PM
I discovered a fatal flaw in a paper by @floriantramer.bsky.social et al claiming to break our Ensemble Everything Everywhere defense. Due to a coding error they used attacks 20x above the standard 8/255. They confirmed this but the paper is already out & quoted on OpenReview. What should we do now?
December 12, 2024 at 4:29 PM
A decade ago, AlphaGo inspired me to leap from black holes to AI. Bittersweet to close my time at DeepMind, but thrilled to start a new chapter focusing directly on AI and security. Find me at NeurIPS this week or DM me here if you'd like to chat!
December 9, 2024 at 6:49 PM
Reposted by Stanislav Fort
✨ Super excited to share our paper **Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness** arxiv.org/abs/2408.05446

Inspired by biology we 1) get adversarial robustness + interpretability for free, 2) turn classifiers into generators & 3) design attacks on GPT-4
November 19, 2024 at 6:19 PM
✨ Super excited to share our paper **Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness** arxiv.org/abs/2408.05446

Inspired by biology we 1) get adversarial robustness + interpretability for free, 2) turn classifiers into generators & 3) design attacks on GPT-4
November 19, 2024 at 6:19 PM
My favorite description of a large language model was accidentally written by Ray Bradbury in 1969, more than half a century ago, and it's eerie how fitting its rendition of an emergent language mind is:

vvvvvvv The poem follows in the replies vvvvvv
November 15, 2024 at 9:59 AM
We "rickrolled" GPT-4o by a specially crafted image of Stephen Hawking 😵‍💫! This is AFAIK the first case of successful transferrable image attacks on frontier models
(www.youtube.com/watch?v=mf_E...)

📝Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

Paper below 👇
Rickrolling the OpenAI GPT-4o by a specially modified photo of Stephen Hawking
YouTube video by Stanislav Fort
www.youtube.com
November 11, 2024 at 12:26 PM
There is a popular piece by @washingtonpost.com claiming that GPT-4 consumes 0.14 kWh per 100 words. At $0.15/kWh this implies ~$150/1M tokens *for electricity alone* which is 10x what OpenAI charges *in total*. The WaPo estimate is therefore certainly very off and should be corrected
October 2, 2024 at 1:03 PM
I have written up my argument for solving adversarial attacks in computer vision as a baby version of general AI alignment. I think that the *shape* of the problem is very similar & that we *have* to be able to solve it before tackling the A(G)I case.

Blog post: www.lesswrong.com/posts/oPnFzf...
September 4, 2024 at 5:25 AM