Igor Shilov (➡️ ICML 🇨🇦)
banner
igorshilov.bsky.social
Igor Shilov (➡️ ICML 🇨🇦)
@igorshilov.bsky.social
Anthropic AI Safety Fellow

PhD student at Imperial College London.
ML, interpretability, privacy, and stuff
🏳️‍🌈

https://igorshilov.com/
Arrived in beautiful Vancouver!
More conferences with mountain views please!

Ping me if you want to chat about privacy and security of LLMs!
July 16, 2025 at 1:43 PM
The best part? You can collect per-sample losses for free during training by simply changing the loss reduction:
June 24, 2025 at 3:17 PM
Our proposed loss trace aggregation methods achieve 92% Precision@k=1% in identifying samples vulnerable to LiRA attack on CIFAR-10 (positives at FPR=0.001). Prior computationally effective vulnerability detection methods (loss, gradient norm) perform barely better than random on the same task.
June 24, 2025 at 3:17 PM
🐸 Check out these CIFAR-10 frog examples:

Easy-to-fit outliers: Loss drops late but reaches near zero → most vulnerable

Hard-to-fit outliers: Loss drops slowly, stays relatively high → somewhat vulnerable

Average samples: Loss drops quickly and stays low → least vulnerable
June 24, 2025 at 3:17 PM
The line-up for the evening:

- Graham Cormode (University of Warwick/Meta AI)
- Lukas Wutschitz (M365 Research, Microsoft)
- Jamie Hayes (Google DeepMind)
- Ilia Shumailov (Google DeepMind)
December 17, 2024 at 10:26 AM
Wow so we actually got to a point where Anthropic sponsors exhibitions at Tate modern
November 20, 2024 at 11:06 AM
Low-stakes conspiracy of the day: the protestor throwing glitter at Starmer was a personal favour from Starmer himself. Because as a serious politician you don’t get to wear glitter in public anymore, and sometimes nothing hits quite like it
October 13, 2023 at 9:28 AM