Lightnews — Scholar-powered news

Jan Dubiński

@jandubinski.bsky.social

Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@cvprconference.bsky.social

#AI #GenerativeAI #privacy

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

CDI confidently identifies training data with as low as 70 suspect samples!

Please check out the paper for more:
📜https://arxiv.org/abs/2411.12858

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

Instead, we propose CDI, a method that empowers data owners to check if their data was used to train a DM. CDI relies on selectively combining diverse membership signals from multiple samples an d statistical testing.

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

Unfortunately, state-of-the-art Membership Inference Attacks struggle to identify training data in large DMs - often performing close to random guessing (True Positive Rate = 1% at False Positive Rate = 1%), e.g. on DMs trained on ImageNet.

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

DMs benefit from large and diverse datasets for training - often sourced without the data owners' consent.

This raises a key question: was your data used? Membership Inference Attacks aim to find out by determining whether a specific data point was part of a model’s training set.

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

TL;DR: We show that Membership Inference Attacks (MIAs) struggle to detect training data in SOTA Diffusion Models (DMs) and instead propose the first dataset inference method to achieve this goal.

#AI #MachineLearning #GenerativeAI #Copyright

June 13, 2025 at 7:11 PM

Jan Dubiński

@jandubinski.bsky.social

Thanks for working on that together Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic!

@ideas-ncbr.bsky.social
#AI #GenerativeAI #privacy

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

If you’d like to learn more, check out our full arXiv paper, where we dive deeper into membership inference attacks, dataset inference, and memorization risks in IARs.

👉 Read the full paper: Privacy Attacks on Image AutoRegressive Models arxiv.org/abs/2502.02514

🧵 6/

Privacy Attacks on Image AutoRegressive Models

Image autoregressive (IAR) models have surpassed diffusion models (DMs) in both image quality (FID: 1.48 vs. 1.58) and generation speed. However, their privacy risks remain largely unexplored. To addr...

arxiv.org

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

ARs push image generation forward, but at a cost—higher privacy risks.

🛟 Can we make IARs safer?

✳️ We find Masked AutoRegressive models (MAR) inherently more private, likely because they incorporate diffusion-based techniques.

🧵 5/

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

⚠️ That's not all!

Large IARs memorize and regurgitate data at an alarming rate, making them vulnerable to copyright infringement, privacy violations, and dataset exposure.

🖼️ Our data extraction attack recovered up to 698 training images from the largest VAR model.

🧵 4/

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

⚠️ How serious is it?

🔍 Our findings are striking: attacks for identifying training samples are orders of magnitude more effective on IARs than DMs.

🧵 3/

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

IARs deliver higher quality, faster generation, and better scalability than #DiffusionModels (DMs), using techniques similar to Large Language Models like #GPT .

💡 Impressive? Absolutely. Safe? Not so much.

We find that IARs are highly vulnerable to privacy attacks.

🧵 2/

February 5, 2025 at 6:36 PM

Jan Dubiński

@jandubinski.bsky.social

🙌 I am glad to be a part of this research with Youcef Djenouri, Nassim Belmecheri, Tomasz Michalak, Ahmed Nabil Belbachir, and Anis Yazidi!

December 20, 2024 at 3:28 PM

Jan Dubiński

@jandubinski.bsky.social

📜 LGR-AD enables multiple diffusion model agents 🤖 to collaborate through a graph network, significantly enhancing quality and flexibility in text-to-image generation 🖼️.

December 20, 2024 at 3:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news