Lightnews — Scholar-powered news

Reposted by Aditi Mavalankar

Tom Schaul

@schaul.bsky.social

Where do some of Reinforcement Learning's great thinkers stand today?

Find out! Keynotes of the RL Conference are online:
www.youtube.com/playlist?lis...

Wanting vs liking, Agent factories, Theoretical limit of LLMs, Pluralist value, RL teachers, Knowledge flywheels
(guess who talked about which!)

August 27, 2025 at 12:46 PM

Aditi Mavalankar

@aditimavalankar.bsky.social

On my way to #ICML2025 to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! 🚀

Reach out if you’d like to chat more!

July 13, 2025 at 12:26 PM

Reposted by Aditi Mavalankar

Abhinav Moudgil

@amoudgl.bsky.social

New side project!

assayer: A simple Python-RQ based tool to automatically monitor and evaluate ML model checkpoints offline during training.

June 15, 2025 at 10:27 PM

Reposted by Aditi Mavalankar

Tom Schaul

@schaul.bsky.social

Ever thought of joining DeepMind's RL team? We're recruiting for a research engineering role in London:
job-boards.greenhouse.io/deepmind/job...
Please spread the word!

Research Engineer, Reinforcement Learning

London, UK

job-boards.greenhouse.io

May 22, 2025 at 3:11 PM

Aditi Mavalankar

@aditimavalankar.bsky.social

Accepted to #ICML2025
See you in Vancouver!

Aditi Mavalankar @aditimavalankar.bsky.social · Mar 17

Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance!
arxiv.org/abs/2502.18487

🧵

AuPair: Golden Example Pairs for Code Repair

Scaling up inference-time compute has proven to be a valuable strategy in improving the performance of Large Language Models (LLMs) without fine-tuning. An important task that can benefit from additio...

arxiv.org

May 2, 2025 at 10:35 AM

Reposted by Aditi Mavalankar

Tom Schaul

@schaul.bsky.social

When faced with a challenge (like debugging) it helps to think back to examples of how you've overcome challenges in the past. Same for LLMs!

The method we introduce in this paper is efficient because examples are chosen for their complementarity, leading to much steeper inference-time scaling! 🧪

Aditi Mavalankar @aditimavalankar.bsky.social · Mar 17

Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance!
arxiv.org/abs/2502.18487

🧵

AuPair: Golden Example Pairs for Code Repair

Scaling up inference-time compute has proven to be a valuable strategy in improving the performance of Large Language Models (LLMs) without fine-tuning. An important task that can benefit from additio...

arxiv.org

March 20, 2025 at 10:23 AM

Aditi Mavalankar

@aditimavalankar.bsky.social

Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance!
arxiv.org/abs/2502.18487

🧵

AuPair: Golden Example Pairs for Code Repair

Scaling up inference-time compute has proven to be a valuable strategy in improving the performance of Large Language Models (LLMs) without fine-tuning. An important task that can benefit from additio...

arxiv.org

March 17, 2025 at 11:16 AM

Reposted by Aditi Mavalankar

Tom Schaul

@schaul.bsky.social

Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"?

Thoughts about this and more here:
arxiv.org/abs/2411.16905

Boundless Socratic Learning with Language Games

An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its covera...

arxiv.org

November 28, 2024 at 4:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news