Lightnews — Scholar-powered news

Ryan Sullivan

@ryanpsullivan.bsky.social

PhD Candidate at the University of Maryland researching reinforcement learning and autocurricula in complex, open-ended environments.

Previously RL intern @ SonyAI, RLHF intern @ Google Research, and RL intern @ Amazon Science

Posts Replies Media Videos

Reposted by Ryan Sullivan

Antonin Raffin

@araffin.bsky.social

"As researchers, we tend to publish only positive results, but I think a lot of valuable insights are lost in our unpublished failures."

New blog post: Getting SAC to Work on a Massive Parallel Simulator (part I)

araffin.github.io/post/sac-mas...

Getting SAC to Work on a Massive Parallel Simulator: An RL Journey With Off-Policy Algorithms (Part I) | Antonin Raffin | Homepage

This post details how I managed to get the Soft-Actor Critic (SAC) and other off-policy reinforcement learning algorithms to work on massively parallel simulators (think Isaac Sim with thousands of ro...

araffin.github.io

March 10, 2025 at 8:22 AM

Ryan Sullivan

@ryanpsullivan.bsky.social

I’m heading to AAAI to present our work on multi-objective preference alignment for DPO from my internship with GoogleAI. If anyone wants to chat about RLHF, RL in games, curriculum learning, or open-ended environments please reach out!

February 26, 2025 at 8:29 PM

Reposted by Ryan Sullivan

Marc Lanctot

@sharky6000.bsky.social

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N

February 24, 2025 at 3:25 PM

Reposted by Ryan Sullivan

Costa Huang

@vwxyzjn.bsky.social

We released the OLMo 2 report! Ready for some more RL curves? 😏

This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆.

And it works! A thread 🧵 1/N

January 6, 2025 at 6:34 PM

Reposted by Ryan Sullivan

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

My recurrent refrain of the year is to really use the environments in pufferlib. There’s no reason not to have your environments run at a million fps on a single cpu core github.com/PufferAI/Puf...

GitHub - PufferAI/PufferLib: Simplifying reinforcement learning for complex game environments

Simplifying reinforcement learning for complex game environments - PufferAI/PufferLib

github.com

December 9, 2024 at 2:47 PM

Ryan Sullivan

@ryanpsullivan.bsky.social

Have you ever wanted to add curriculum learning (CL) to an RL project but decided it wasn't worth the effort?

I'm happy to announce the release of Syllabus, a library of portable curriculum learning methods that work with any RL code!

github.com/RyanNavillus...

GitHub - RyanNavillus/Syllabus: Synchronized Curriculum Learning for RL Agents

Synchronized Curriculum Learning for RL Agents. Contribute to RyanNavillus/Syllabus development by creating an account on GitHub.

github.com

December 5, 2024 at 4:11 PM

Ryan Sullivan

@ryanpsullivan.bsky.social

Another awesome iteration of Genie! I fully agree with training generalist agents in simulation like this, though I believe in using real games to teach long-term strategies. Still, it’s easy to see how SIMA and Genie will continue to improve, and maybe even give us a true foundation model for RL.

Jack Parker-Holder @jparkerholder.bsky.social · Dec 4

Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.

December 4, 2024 at 7:55 PM

Ryan Sullivan

@ryanpsullivan.bsky.social

This is one of my favorite lines of work in RL. When I was starting my PhD, I was working on a multi-agent evaluation problem, having just finished a “voting math” class my last semester at Purdue. I scribbled some notes about how games in a tournament could be viewed as votes… 1/2

Marc Lanctot @sharky6000.bsky.social · Nov 11

Ok, time to start posting some actual AI things. 😅

This week I will tell you about several papers in the theme of social choice theory (and agent/model evals), starting with an old paper of ours that I am still excited about:

"Evaluating Agents using Social Choice Theory"

🧵 1/N

November 28, 2024 at 2:49 AM

Ryan Sullivan

@ryanpsullivan.bsky.social

I just got here, thanks @rockt.ai for putting together an open-endedness starter pack! If there's anyone else working on exploration, curriculum learning, or open-ended environments, leave a reply so I can follow you!

I'll be sharing some cool curriculum learning work in a few days, stay tuned!

Tim Rocktäschel @handle.invalid · Nov 20

Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.

go.bsky.app/MdVxrtD

November 22, 2024 at 5:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news