Lightnews — Scholar-powered news

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

Professor of Applied Mathematics and CS at Imperial College London (🇬🇧). MIT PhD. I'm working on automated privacy attacks, LLM memorization, and AI Safety. Road cyclist 🚴 and former EU Special Adviser (🇪🇺).

Posts Replies Media Videos

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

➡️ Read the full paper here: arxiv.org/abs/2505.15738

This is work with my amazing students 🧑‍🎓 at Imperial College London: Xiaoxue Yang, Bozhidar Stevanoski and Matthieu Meeus

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Large language models (LLMs) are rapidly deployed in real-world applications ranging from chatbots to agentic systems. Alignment is one of the main approaches used to defend against attacks such as pr...

arxiv.org

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

To properly defend LLM agents against prompt injection, we need 1️⃣ better defenses which are robust against informed adversaries, and 2️⃣ account for these vulnerabilities even in “aligned” LLMs when deploying them as agents.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

💬 Does this mean the existing alignment-based defenses 🛡️ are not useful? No! But they are likely more brittle than previously believed.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

More specifically, it uses intermediate training checkpoints as “stepping stones” 👣🪨 to craft attacks against the final aligned model. This is hugely successful with the suffixes found by Checkpoint-GCG, bypassing SOTA defenses such as SecAlign 90%+ of the time 🎯.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

We propose Checkpoint-GCG, an attack method that assumes an informed adversary with some knowledge of the alignment mechanism 🧭.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

🤔 How would we know this though? We propose to use informed adversaries – attackers with more knowledge than currently seems “realistic”, to evaluate the robustness of defenses against future, yet-unknown attacks like we do in privacy.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

With LLMs being integrated into systems everywhere and deployed as agents, we however argue that this is not enough ⚠️. We cannot constantly pen-and-patch, patching LLMs every time a new attack is discovered. We need to ensure our defenses are robust and future-proof 🦾.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

Recent methods claim near-perfect protection against existing red teaming attacks, including GCG, which automatically finds adversarial suffixes to manipulate model behaviour.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

🛡️ Today’s defenses against prompt injection typically rely on alignment-based training, teaching LLMs to ignore injected instructions 💉.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

Sophisticated prompt injection attacks are often done by pairing instructions with adversarial suffixes 💣 that trick models into following the injected instructions.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

This is known as prompt injection 💉, where malicious actors hide instructions in files or web pages (like invisible white text) that manipulate the LLM’s behaviour.

June 20, 2025 at 10:51 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

📍 Imperial College London
📅Start: October 2025
⏳Application deadline: June 6th
📩Application steps: cpg.doc.ic.ac.uk/openings/

Openings - Computational Privacy Group, Imperial College London

Openings in the Computational Privacy Group at Imperial College London

cpg.doc.ic.ac.uk

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

This is an exciting opportunity for technically strong and curious candidates who want to do meaningful research that influences both academia and industry. If you’re weighing the next step in your career, we offer a path to impactful, high-quality research with freedom to explore

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

To see more of our work and get to know the team, check here (cpg.doc.ic.ac.uk)!

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

✅Can individuals be re-identified even from aggregated statistics? (arxiv.org/abs/2504.18497)

✅How can we efficiently identify training samples at risk of leaking in ML models? (arxiv.org/abs/2411.05743)

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

✅How can we rigorously measure what LLMs memorize? (arxiv.org/abs/2406.17975)

✅How can we automatically discover privacy vulnerabilities in query-based systems at scale and in practice? (arxiv.org/abs/2409.01992)

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

Happy to share that we are offering one additional fully-funded PhD position starting in Fall 2025! Our research group at Imperial College London works on machine learning and data privacy and security.

Recently, we tackled questions such as:

May 20, 2025 at 10:33 AM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

Work with my amazing students and collaborators Zexi Yao, natasakrco.bsky.social, and Georgi Ganev.

🔗 Full paper: arxiv.org/abs/2505.01524

The DCR Delusion: Measuring the Privacy Risk of Synthetic Data

Synthetic data has become an increasingly popular way to share data without revealing sensitive information. Though Membership Inference Attacks (MIAs) are widely considered the gold standard for empi...

arxiv.org

May 9, 2025 at 12:26 PM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

What should I do then? Use MIAs. They are the rigorous and comprehensive standard for evaluating the privacy of synthetic data, including making legal anonymity claims, and when comparing models.

May 9, 2025 at 12:21 PM

Yves-Alexandre de Montjoye

@yvesalexandre.bsky.social

DCR indeed only appears to catch the most obvious privacy failures, like synthetic datasets that contain large numbers of exact copies from the training data.

May 9, 2025 at 12:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news