Jan Dubiński
jandubinski.bsky.social
Jan Dubiński
@jandubinski.bsky.social
PhD student in Machine Learning @Warsaw University of Technology and @IDEAS NCBR
Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@cvprconference.bsky.social

#AI #GenerativeAI #privacy
June 13, 2025 at 7:11 PM
CDI confidently identifies training data with as low as 70 suspect samples!

Please check out the paper for more:
📜https://arxiv.org/abs/2411.12858
June 13, 2025 at 7:11 PM
Instead, we propose CDI, a method that empowers data owners to check if their data was used to train a DM. CDI relies on selectively combining diverse membership signals from multiple samples an d statistical testing.
June 13, 2025 at 7:11 PM
Unfortunately, state-of-the-art Membership Inference Attacks struggle to identify training data in large DMs - often performing close to random guessing (True Positive Rate = 1% at False Positive Rate = 1%), e.g. on DMs trained on ImageNet.
June 13, 2025 at 7:11 PM
DMs benefit from large and diverse datasets for training - often sourced without the data owners' consent.

This raises a key question: was your data used? Membership Inference Attacks aim to find out by determining whether a specific data point was part of a model’s training set.
June 13, 2025 at 7:11 PM
TL;DR: We show that Membership Inference Attacks (MIAs) struggle to detect training data in SOTA Diffusion Models (DMs) and instead propose the first dataset inference method to achieve this goal.

#AI #MachineLearning #GenerativeAI #Copyright
June 13, 2025 at 7:11 PM
Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@ideas-ncbr.bsky.social
#AI #GenerativeAI #privacy
February 5, 2025 at 6:36 PM
If you’d like to learn more, check out our full arXiv paper, where we dive deeper into membership inference attacks, dataset inference, and memorization risks in IARs.

👉 Read the full paper: Privacy Attacks on Image AutoRegressive Models arxiv.org/abs/2502.02514

🧵 6/
Privacy Attacks on Image AutoRegressive Models
Image autoregressive (IAR) models have surpassed diffusion models (DMs) in both image quality (FID: 1.48 vs. 1.58) and generation speed. However, their privacy risks remain largely unexplored. To addr...
arxiv.org
February 5, 2025 at 6:36 PM
ARs push image generation forward, but at a cost—higher privacy risks.

🛟 Can we make IARs safer?

✳️ We find Masked AutoRegressive models (MAR) inherently more private, likely because they incorporate diffusion-based techniques.

🧵 5/
February 5, 2025 at 6:36 PM
⚠️ That's not all!

Large IARs memorize and regurgitate data at an alarming rate, making them vulnerable to copyright infringement, privacy violations, and dataset exposure.

🖼️ Our data extraction attack recovered up to 698 training images from the largest VAR model.

🧵 4/
February 5, 2025 at 6:36 PM
⚠️ How serious is it?

🔍 Our findings are striking: attacks for identifying training samples are orders of magnitude more effective on IARs than DMs.

🧵 3/
February 5, 2025 at 6:36 PM
IARs deliver higher quality, faster generation, and better scalability than #DiffusionModels (DMs), using techniques similar to Large Language Models like #GPT .

💡 Impressive? Absolutely. Safe? Not so much.

We find that IARs are highly vulnerable to privacy attacks.

🧵 2/
February 5, 2025 at 6:36 PM
🙌 I am glad to be a part of this research with Youcef Djenouri, Nassim Belmecheri, Tomasz Michalak, Ahmed Nabil Belbachir, and Anis Yazidi!
December 20, 2024 at 3:28 PM
📜 LGR-AD enables multiple diffusion model agents 🤖 to collaborate through a graph network, significantly enhancing quality and flexibility in text-to-image generation 🖼️.
December 20, 2024 at 3:27 PM