Lightnews — Scholar-powered news

@mlnews.bsky.social

2 followers 2 following 12 posts

Posts Replies Media Videos

mlnews.bsky.social

@mlnews.bsky.social

Reinforcement Learning via Self-Distillation

introduces SDPO, which converts tokenized feedback into a dense learning signal without any external teacher or explicit reward model.

arxiv.org/abs/2601.20802

February 2, 2026 at 8:53 AM

mlnews.bsky.social

@mlnews.bsky.social

New paper "Teaching Models to Teach Themselves" introduces SOAR. It uses meta-RL to let a 'teacher' model generate stepping-stone problems for a 'student' to solve.

Surprising result: The teacher doesn't need to know the final answer to guide the student effectively. arxiv.org/abs/2601.18778

January 29, 2026 at 12:36 PM

mlnews.bsky.social

@mlnews.bsky.social

Ministral 3 was introduced on the 13th of Jan

a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three sizes: 3B, 8B, and 14B. They present a recipe to derive the models through Cascade Distillation. arxiv.org/pdf/2601.08584

January 29, 2026 at 12:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news