Yves-Alexandre de Montjoye
banner
yvesalexandre.bsky.social
Yves-Alexandre de Montjoye
@yvesalexandre.bsky.social
Professor of Applied Mathematics and CS at Imperial College London (🇬🇧). MIT PhD. I'm working on automated privacy attacks, LLM memorization, and AI Safety. Road cyclist 🚴 and former EU Special Adviser (🇪🇺).
➡️ Read the full paper here: arxiv.org/abs/2505.15738

This is work with my amazing students 🧑‍🎓 at Imperial College London: Xiaoxue Yang, Bozhidar Stevanoski and Matthieu Meeus
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses
Large language models (LLMs) are rapidly deployed in real-world applications ranging from chatbots to agentic systems. Alignment is one of the main approaches used to defend against attacks such as pr...
arxiv.org
June 20, 2025 at 10:51 AM
To properly defend LLM agents against prompt injection, we need 1️⃣ better defenses which are robust against informed adversaries, and 2️⃣ account for these vulnerabilities even in “aligned” LLMs when deploying them as agents.
June 20, 2025 at 10:51 AM
💬 Does this mean the existing alignment-based defenses 🛡️ are not useful? No! But they are likely more brittle than previously believed.
June 20, 2025 at 10:51 AM
More specifically, it uses intermediate training checkpoints as “stepping stones” 👣🪨 to craft attacks against the final aligned model. This is hugely successful with the suffixes found by Checkpoint-GCG, bypassing SOTA defenses such as SecAlign 90%+ of the time 🎯.
June 20, 2025 at 10:51 AM
We propose Checkpoint-GCG, an attack method that assumes an informed adversary with some knowledge of the alignment mechanism 🧭.
June 20, 2025 at 10:51 AM
🤔 How would we know this though? We propose to use informed adversaries – attackers with more knowledge than currently seems “realistic”, to evaluate the robustness of defenses against future, yet-unknown attacks like we do in privacy.
June 20, 2025 at 10:51 AM
With LLMs being integrated into systems everywhere and deployed as agents, we however argue that this is not enough ⚠️. We cannot constantly pen-and-patch, patching LLMs every time a new attack is discovered. We need to ensure our defenses are robust and future-proof 🦾.
June 20, 2025 at 10:51 AM
Recent methods claim near-perfect protection against existing red teaming attacks, including GCG, which automatically finds adversarial suffixes to manipulate model behaviour.
June 20, 2025 at 10:51 AM
🛡️ Today’s defenses against prompt injection typically rely on alignment-based training, teaching LLMs to ignore injected instructions 💉.
June 20, 2025 at 10:51 AM
Sophisticated prompt injection attacks are often done by pairing instructions with adversarial suffixes 💣 that trick models into following the injected instructions.
June 20, 2025 at 10:51 AM
This is known as prompt injection 💉, where malicious actors hide instructions in files or web pages (like invisible white text) that manipulate the LLM’s behaviour.
June 20, 2025 at 10:51 AM
📍 Imperial College London
📅Start: October 2025
⏳Application deadline: June 6th
📩Application steps: cpg.doc.ic.ac.uk/openings/
Openings - Computational Privacy Group, Imperial College London
Openings in the Computational Privacy Group at Imperial College London
cpg.doc.ic.ac.uk
May 20, 2025 at 10:33 AM
This is an exciting opportunity for technically strong and curious candidates who want to do meaningful research that influences both academia and industry. If you’re weighing the next step in your career, we offer a path to impactful, high-quality research with freedom to explore
May 20, 2025 at 10:33 AM
To see more of our work and get to know the team, check here (cpg.doc.ic.ac.uk)!
May 20, 2025 at 10:33 AM
✅Can individuals be re-identified even from aggregated statistics? (arxiv.org/abs/2504.18497)

✅How can we efficiently identify training samples at risk of leaking in ML models? (arxiv.org/abs/2411.05743)
May 20, 2025 at 10:33 AM
✅How can we rigorously measure what LLMs memorize? (arxiv.org/abs/2406.17975)

✅How can we automatically discover privacy vulnerabilities in query-based systems at scale and in practice? (arxiv.org/abs/2409.01992)
May 20, 2025 at 10:33 AM
Happy to share that we are offering one additional fully-funded PhD position starting in Fall 2025! Our research group at Imperial College London works on machine learning and data privacy and security.

Recently, we tackled questions such as:
May 20, 2025 at 10:33 AM
Work with my amazing students and collaborators Zexi Yao, natasakrco.bsky.social, and Georgi Ganev.

🔗 Full paper: arxiv.org/abs/2505.01524
The DCR Delusion: Measuring the Privacy Risk of Synthetic Data
Synthetic data has become an increasingly popular way to share data without revealing sensitive information. Though Membership Inference Attacks (MIAs) are widely considered the gold standard for empi...
arxiv.org
May 9, 2025 at 12:26 PM
What should I do then? Use MIAs. They are the rigorous and comprehensive standard for evaluating the privacy of synthetic data, including making legal anonymity claims, and when comparing models.
May 9, 2025 at 12:21 PM
DCR indeed only appears to catch the most obvious privacy failures, like synthetic datasets that contain large numbers of exact copies from the training data.
May 9, 2025 at 12:21 PM