Lightnews — Scholar-powered news

Riccardo Mereu

@rmwu.bsky.social

39 followers 340 following 2 posts

Posts Replies Media Videos

Reposted by Riccardo Mereu

Paul Chang

@mummitrollet.bsky.social

Multi-Head Latent Attention vs Group Query Attention: We break down why MLA is a more expressive memory compression technique AND why naive implementations can backfire. Check it out!

datacrunch.io @datacrunch.io · Mar 12

⚡️Multi-Head Latent Attention is one of the key innovations that enabled @deepseek_ai's V3 and the subsequent R1 model.

⏭️ Join us as we continue our series into efficient AI inference, covering both theoretical insights and practical implementation:

🔗 datacrunch.io/blog/deepsee...

DeepSeek + SGLang: Multi-Head Latent Attention

Multi-Head Latent Attention (MLA) improves upon Group Query Attention (GQA), enabling long-context reasoning models and wider adoption across open-source LLMs.

datacrunch.io

March 12, 2025 at 7:01 PM

Reposted by Riccardo Mereu

Andrea Perin

@zazzarazzaz.bsky.social

Little is known about how deep networks interact with structure in data. An important aspect of this structure is symmetry (e.g., pose transformations). Here, we (w/ @stphtphsn.bsky.social) study the generalization ability of deep networks on symmetric datasets: arxiv.org/abs/2412.11521 (1/n)

On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory

Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds significant promise for improving predictions in machine learning. In this work, we aim to underst...

arxiv.org

January 14, 2025 at 1:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news