Lightnews — Scholar-powered news

Reposted by Pierre Ablin

Preetum Nakkiran

@preetumnakkiran.bsky.social

Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/

Left Image: A shaggy dog-horse hybrid standing in a rural landscape.
Right Image: A golden dog wearing a red beret against a blurred outdoor background.

February 11, 2025 at 5:59 AM

Reposted by Pierre Ablin

Fabian Schaipp

@fschaipp.bsky.social

Learning rate schedules seem mysterious? Why is the loss going down so fast during cooldown?
Turns out that this behaviour can be described with a bound from *convex, nonsmooth* optimization.

A short thread on our latest paper 🚞

arxiv.org/abs/2501.18965

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant schedul...

arxiv.org

February 5, 2025 at 10:13 AM

Pierre Ablin

@pierreablin.bsky.social

Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804

February 5, 2025 at 9:32 AM

Reposted by Pierre Ablin

Mathieu Blondel

@mblondel.bsky.social

Really proud of these two companion papers by our team at GDM:

1) Joint Learning of Energy-based Models and their Partition Function
arxiv.org/abs/2501.18528

2) Loss Functions and Operators Generated by f-Divergences
arxiv.org/abs/2501.18537

A thread.

January 31, 2025 at 12:06 PM

Reposted by Pierre Ablin

Valérie Castin

@vcastin.bsky.social

How do tokens evolve as they are processed by a deep Transformer?

With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

January 31, 2025 at 4:56 PM

Reposted by Pierre Ablin

Samuel Vaiter

@samuelvaiter.com

Byte Pair Encoding is a tokenization method that starts with all characters as initial tokens. It iteratively merges the most frequent adjacent byte pairs in the text, adding new tokens to the vocabulary until reaching a predefined size. The output is a sequence of tokens. https://buff.ly/42oG80f

January 30, 2025 at 6:00 AM

Reposted by Pierre Ablin

Gaël Varoquaux

@gaelvaroquaux.bsky.social

🎓 💫 We are opening post-doc positions at the intersection of AI, data science, and medicine:
• Large Language Models for French medical texts
• Evaluating digital medical devices: statistics and causal inference

January 29, 2025 at 8:19 AM

Pierre Ablin

@pierreablin.bsky.social

Mixture of experts are all the rage when it comes to shipping low-latency LLMs.

Check out this awesome work by Samira et al. about scaling laws for mixture of experts !

Samira @samiraabnar.bsky.social · Jan 28

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

January 28, 2025 at 10:15 AM

Reposted by Pierre Ablin

Samira

@samiraabnar.bsky.social

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:

January 28, 2025 at 6:26 AM

Reposted by Pierre Ablin

Pau Rodriguez

@paurodriguez.bsky.social

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵

December 10, 2024 at 1:09 PM

Pierre Ablin

@pierreablin.bsky.social

Excited to see Sigmoid Attention accepted at ICLR 2025 !!

Make attention ~18% faster with a drop-in replacement 🚀

Code:
github.com/apple/ml-sig...

Paper
arxiv.org/abs/2409.04431

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...

arxiv.org

January 24, 2025 at 6:47 PM

Reposted by Pierre Ablin

Marco Cuturi

@marcocuturi.bsky.social

The Apple Machine Learning Research (MLR) team in Paris has openings for both FTE roles and a short-term post-doc position to contribute to our team's research agenda. Researchers at Apple's MLR (led by Samy Bengio) target impactful publications in top-tier ML venues and OSS.

December 18, 2024 at 5:05 PM

Pierre Ablin

@pierreablin.bsky.social

Congratulations for these new models !!

Alaa El-Nouby @alaaelnouby.bsky.social · Nov 22

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding 🧵

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

November 22, 2024 at 10:33 AM

Reposted by Pierre Ablin

Alaa El-Nouby

@alaaelnouby.bsky.social

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding 🧵

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

November 22, 2024 at 8:32 AM

Reposted by Pierre Ablin

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Great video explaining a clever vectorization for learning on strings and dirty categories:

the MinHashEncoder is fast, stateless, and excellent with tree-based learners.
It's in @skrub-data.bsky.social
youtu.be/ZMQrNFef8fg

Why the MinHashEncoder is great for boosted trees

YouTube video by probabl

youtu.be

November 21, 2024 at 10:12 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news