Lightnews — Scholar-powered news

Reposted by Amir Mesbah

Claire Vernade

@claireve.bsky.social

📣 #ICML tutorials: We want to know what *you* would like to learn. This year, Adam White and I are calling for nominations of topics and/or presenters.

Until December 7th, you can send us your suggestions, and we will use them to shape the program.

icml.cc/Conferences/...

ICML 20256 Call For Tutorials

icml.cc

November 12, 2025 at 8:12 AM

Reposted by Amir Mesbah

Pablo Samuel Castro

@pcastr.bsky.social

🚨The Formalism-Implementation Gap in RL research🚨

Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE).

1⃣ Let's advance science of RL
2⃣ Let's be explicit about how benchmarks map to formalism

1/X

October 28, 2025 at 1:56 PM

Reposted by Amir Mesbah

Amir Balef

@amirbalef.bsky.social

I am happy to share that our paper "Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning" has been accepted at NeurIPS 2025!

Endless thanks to my amazing co-authors @claireve.bsky.social and @keggensperger.bsky.social

📄 Read it on arXiv: arxiv.org/abs/2505.05226

(1/3)

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

The Combined Algorithm Selection and Hyperparameter optimization (CASH) is a challenging resource allocation problem in the field of AutoML. We propose MaxUCB, a max $k$-armed bandit method to trade o...

arxiv.org

October 6, 2025 at 4:54 PM

Reposted by Amir Mesbah

Claas Voelcker

@cvoelcker.bsky.social

cvoelcker.de/blog/2025/re...

I finally gave in and made a nice blog post about my most recent paper. This was a surprising amount of work, so please be nice and go read it!

a close up of a sad cat with the words pleeeaasse written below it

ALT: a close up of a sad cat with the words pleeeaasse written below it

media.tenor.com

October 2, 2025 at 9:34 PM

Reposted by Amir Mesbah

Claas Voelcker

@cvoelcker.bsky.social

cvoelcker.de/blog/2025/re...

Here ya go!

Relative Entropy Pathwise Policy Optimization - Technical Overview | Claas A. Voelcker

A lightweight overview of the new REPPO algorithm

cvoelcker.de

October 2, 2025 at 9:31 PM

Reposted by Amir Mesbah

Amir-massoud Farahmand

@sologen.bsky.social

What are we talking about when we talk about Dynamic Programming?

#ReinforcementLearning

August 3, 2025 at 8:14 PM

Amir Mesbah

@amirmesbah.bsky.social

What if all mathematicians had great visualization skills, tools, and public notes!

July 31, 2025 at 4:22 PM

Reposted by Amir Mesbah

Claire Vernade

@claireve.bsky.social

Onno and I will be presenting our poster at # W1005 tomorrow (Wed) morning.
He made a great thread about it, come chat with us about POMDP theory :)

Onno Eberhard @onnoeberhard.com · Jul 16

I am in Vancouver at ICML, and tomorrow I will present our newest paper "Partially Observable Reinforcement Learning with Memory Traces". We argue that eligibility traces are more effective than sliding windows as a memory mechanism for RL in POMDPs. 🧵

July 16, 2025 at 3:45 AM

Reposted by Amir Mesbah

Amir-massoud Farahmand

@sologen.bsky.social

I will not be at #ICML2025 this year, but 3 of my PhD students at 🤖 Adage (Adaptive Agents Lab) 🤖 are, presenting 3 papers.
⭐ Avery Ma
⭐ Claas Voelcker (cvoelcker.bsky.social)
⭐ Tyler Kastner

Meet them to talk about Model-based RL, Distributional RL, and Jailbreaking LLMs.

July 14, 2025 at 6:54 PM

Reposted by Amir Mesbah

Shahab Bakhtiari

@shahabbakht.bsky.social

Levine's take on the success of LLMs compared to video models is interesting, but I'll expand on how efforts toward AI could take two different paths, and why I think AI and NeuroAI could take different approaches moving forward. 🧵

🧠🤖 #MLSky

Shahab Bakhtiari @shahabbakht.bsky.social · Jun 12

AI may still need some neuroscience:

"AI systems will not acquire the flexibility and adaptability of human intelligence until they can actually learn like humans do, shining brightly with their own light rather than observing a shadow from ours."

🧠🤖

sergeylevine.substack.com/p/language-m...

Language Models in Plato's Cave

Why language models succeeded where video models failed, and what that teaches us about AI

sergeylevine.substack.com

June 12, 2025 at 2:30 PM

Reposted by Amir Mesbah

Hafez Ghaemi

@hafezghm.bsky.social

Preprint Alert 🚀

Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?

TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases – without extra loss terms and predictors!

🧵 (1/10)

May 14, 2025 at 12:53 PM

Reposted by Amir Mesbah

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

cleanrl is amazing (github.com/vwxyzjn/clea...) and its structure makes sense for teaching but an actual research codebase should not inherit this style! you do not want this amount of code duplication

GitHub - vwxyzjn/cleanrl: High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) - vwxyzjn/cleanrl

github.com

May 11, 2025 at 8:01 PM

Reposted by Amir Mesbah

Nathan Lambert

@natolambert.bsky.social

rlhfbook also available on arxiv for SEO 😀 happy friday
arxiv.org/abs/2504.12501

Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentle…

arxiv.org

April 18, 2025 at 4:07 PM

Reposted by Amir Mesbah

Natasha Jaques

@natashajaques.bsky.social

Recorded a recent "talk" / rant about RL fine-tuning of LLMs for a guest lecture in Stanford CSE234: youtube.com/watch?v=NTSY.... Covers some of my lab's recent work on personalized RLHF, as well as some mild Schmidhubering about my own early contributions to this space

Reinforcement Learning (RL) for LLMs

YouTube video by Natasha Jaques

youtube.com

March 27, 2025 at 9:32 PM

Reposted by Amir Mesbah

Jakob Foerster

@jfoerst.bsky.social

PQN puts Q-learning back on the map and now comes with a blog post + Colab demo! Also, congrats to the team for the spotlight at #ICLR2025

Mattie Fellows @mattieml.bsky.social · Mar 20

PQN blog 3/3 👉take a look at Matteo's 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implementations mttga.github.io/posts/pqn/

🔎 For a deeper dive into the theory:
blog.foersterlab.com/fixing-td-pa...
blog.foersterlab.com/fixing-td-pa...

See you in Singapore! 🇸🇬

Simplifying Deep Temporal Difference Learning

A modern implementation of Deep Q-Network without target networks and replay buffers.

mttga.github.io

March 20, 2025 at 11:51 AM

Amir Mesbah

@amirmesbah.bsky.social

Happy #Nowruz and the beginning of the spring!

March 20, 2025 at 5:37 PM

Reposted by Amir Mesbah

Claire Vernade

@claireve.bsky.social

I’ve put together a short list of opportunities for early career academics willing to come to Europe: www.cvernade.com/miscellaneou...

This mostly covers France and Germany for now but I’m willing to extend it. I build on @ellis.eu resources and my own knowledge of these systems.

Claire Vernade - European career opportunities

European Academic Career Opportunities in 2025

www.cvernade.com

March 11, 2025 at 9:19 AM

Reposted by Amir Mesbah

Pablo Samuel Castro

@pcastr.bsky.social

RL is so back!

(well, for some of us, it never really left)

awards.acm.org/about/2024-t...

Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning.

Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...

awards.acm.org

March 5, 2025 at 10:41 AM

Reposted by Amir Mesbah

Nathan Lambert

@natolambert.bsky.social

First 11 chapters of RLHF Book have v0 draft done. Should be useful now.

Next:
* Crafting more blog content into future topics,
* DPO+ chapter,
* Meeting with publishers to get wheels turning on physical copies,
* Cleaning & cohesiveness
rlhfbook.com

February 26, 2025 at 4:35 PM

Reposted by Amir Mesbah

Neuromatch

@neuromatch.bsky.social

🚨 Neuromatch Academy Course Applications are OPEN for 2025!! 🚨

Get your application in early to be a student or teaching assistant for this year’s courses!

Applications are due Sunday, March 23.

Apply & learn more: neuromatch.io/courses/

#mlsky #compneurosky #ai #climatesolutions #ScienceEdu 🧪

Applications are now open! 3-week courses: Comp Neuro and Deep Learning. 2-week courses: NeuroAI and Comp Tools for Climate Science.

February 24, 2025 at 5:58 PM

Reposted by Amir Mesbah

Ben Recht

@beenwrekt.bsky.social

2014 GoogLeNet: The best image classifier was only trainable using weeks of Google's custom infrastructure.

2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU.

What stops this from happening for LLMs?

Ben Recht @beenwrekt.bsky.social · Jan 27

Machine learning progresses when complicated breakthroughs are soon dramatically simplified as people figure out the salient parts.

What a world we're in where this well-trodden pattern rocks financial markets and escalates geopolitical conflict.

January 27, 2025 at 3:16 PM

Reposted by Amir Mesbah

Glen Berseth

@glenberseth.bsky.social

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.

January 19, 2025 at 7:14 PM

Amir Mesbah

@amirmesbah.bsky.social

I wonder why ML conferences insist on uploading workshop videos on SlideShare while they can use YouTube and the benefits of monetization.
Talks on SlideShare are really hard to track!

January 18, 2025 at 11:17 AM

Reposted by Amir Mesbah

Pablo Samuel Castro

@pcastr.bsky.social

i was recently asked to provide 4 "desert island" RL papers.
if i were stuck on a desert island i'd hope to have something better to read than #RL papers... but anyway, here's a thread with my choices, maybe you can read them on your flight to @neuripsconf.bsky.social #NeurIPS2024 .
Enjoy!

December 6, 2024 at 8:56 PM

Reposted by Amir Mesbah

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

If you're an RL researcher or RL adjacent, pipe up to make sure I've added you here!
go.bsky.app/3WPHcHg

November 9, 2024 at 4:42 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news