Lightnews — Scholar-powered news

Reposted by Pierre Ablin

Preetum Nakkiran

@preetumnakkiran.bsky.social

Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/

Left Image: A shaggy dog-horse hybrid standing in a rural landscape.
Right Image: A golden dog wearing a red beret against a blurred outdoor background.

February 11, 2025 at 5:59 AM

Reposted by Pierre Ablin

Fabian Schaipp

@fschaipp.bsky.social

Learning rate schedules seem mysterious? Why is the loss going down so fast during cooldown?
Turns out that this behaviour can be described with a bound from *convex, nonsmooth* optimization.

A short thread on our latest paper 🚞

arxiv.org/abs/2501.18965

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedul...

arxiv.org

February 5, 2025 at 10:13 AM

Pierre Ablin

@pierreablin.bsky.social

Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804

February 5, 2025 at 9:32 AM

Reposted by Pierre Ablin

Mathieu Blondel

@mblondel.bsky.social

Really proud of these two companion papers by our team at GDM:

1) Joint Learning of Energy-based Models and their Partition Function
arxiv.org/abs/2501.18528

2) Loss Functions and Operators Generated by f-Divergences
arxiv.org/abs/2501.18537

A thread.

January 31, 2025 at 12:06 PM

Reposted by Pierre Ablin

Valérie Castin

@vcastin.bsky.social

How do tokens evolve as they are processed by a deep Transformer?

With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

January 31, 2025 at 4:56 PM

Reposted by Pierre Ablin

Samuel Vaiter

@samuelvaiter.com

Byte Pair Encoding is a tokenization method that starts with all characters as initial tokens. It iteratively merges the most frequent adjacent byte pairs in the text, adding new tokens to the vocabulary until reaching a predefined size. The output is a sequence of tokens. https://buff.ly/42oG80f

January 30, 2025 at 6:00 AM

Reposted by Pierre Ablin

Gaël Varoquaux

@gaelvaroquaux.bsky.social

🎓 💫 We are opening post-doc positions at the intersection of AI, data science, and medicine:
• Large Language Models for French medical texts
• Evaluating digital medical devices: statistics and causal inference

January 29, 2025 at 8:19 AM

Pierre Ablin

@pierreablin.bsky.social

Mixture of experts are all the rage when it comes to shipping low-latency LLMs.

Check out this awesome work by Samira et al. about scaling laws for mixture of experts !

Samira @samiraabnar.bsky.social · Jan 28

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

January 28, 2025 at 10:15 AM

Reposted by Pierre Ablin

Samira

@samiraabnar.bsky.social

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

January 28, 2025 at 6:26 AM

Reposted by Pierre Ablin

Pau Rodriguez

@paurodriguez.bsky.social

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵

December 10, 2024 at 1:09 PM

Pierre Ablin

@pierreablin.bsky.social

Excited to see Sigmoid Attention accepted at ICLR 2025 !!

Make attention ~18% faster with a drop-in replacement 🚀

Code:
github.com/apple/ml-sig...

Paper
arxiv.org/abs/2409.04431

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...

arxiv.org

January 24, 2025 at 6:47 PM

Reposted by Pierre Ablin

Marco Cuturi

@marcocuturi.bsky.social

The Apple Machine Learning Research (MLR) team in Paris has openings for both FTE roles and a short-term post-doc position to contribute to our team's research agenda. Researchers at Apple's MLR (led by Samy Bengio) target impactful publications in top-tier ML venues and OSS.

December 18, 2024 at 5:05 PM

Pierre Ablin

@pierreablin.bsky.social

Congratulations for these new models !!

Alaa El-Nouby @alaaelnouby.bsky.social · Nov 22

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding 🧵

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

November 22, 2024 at 10:33 AM

Reposted by Pierre Ablin

Alaa El-Nouby

@alaaelnouby.bsky.social

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding 🧵

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

November 22, 2024 at 8:32 AM

Reposted by Pierre Ablin

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Great video explaining a clever vectorization for learning on strings and dirty categories:

the MinHashEncoder is fast, stateless, and excellent with tree-based learners.
It's in @skrub-data.bsky.social
youtu.be/ZMQrNFef8fg

Why the MinHashEncoder is great for boosted trees

YouTube video by probabl

youtu.be

November 21, 2024 at 10:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news