Lightnews — Scholar-powered news

Pravesh Koirala

@pravesh.bsky.social

15 followers 340 following 18 posts

PhD Student @ Vanderbilt University
Game Theory | Mathematical Optimization | Multiagent RL | LLM Numeracy

Posts Replies Media Videos

Reposted by Pravesh Koirala

Blake Richards

@tyrellturing.bsky.social

If you review for a #ML conference like @iclr-conf.bsky.social or @neuripsconf.bsky.social, YOU HAVE A RESPONSIBILITY TO REPLY TO THE AUTHORS.

If the rebuttal doesn't address your concerns explain why. But giving a score of 2-3 then ghosting the authors is super rude.

I say this as an AC.

#MLSky

December 4, 2024 at 10:58 PM

Pravesh Koirala

@pravesh.bsky.social

Can anyone just write a rebuttal etiquette or share if there is any. Per Cunningham's law, severely tempted to write a makeshift one myself.

December 3, 2024 at 7:07 PM

Pravesh Koirala

@pravesh.bsky.social

Language Games seem like an interesting research area!

Tom Schaul @schaul.bsky.social · Nov 28

Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"?

Thoughts about this and more here:
arxiv.org/abs/2411.16905

Boundless Socratic Learning with Language Games

An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its covera...

arxiv.org

November 30, 2024 at 5:04 AM

Pravesh Koirala

@pravesh.bsky.social

Eugene Vinitsky 🍒 @eugenevinitsky.bsky.social · Nov 14

Decisions and Dragons is such a nice overview of little RL subtleties that aren't really well explicated elsewhere: www.decisionsanddragons.com

Props @jmac-ai.bsky.social

Screenshot of a blog post. Title: "why does the policy gradient include a log probability term"
Text: "Actually, it doesn't! What you're probably thinking of is the REINFORCE estimate of the policy gradient. How we derive the REINFORCE estimate you're familiar with and why we use it is something I found to be poorly explained in the literature. Fortunately, it is not a hard concept to learn!

November 15, 2024 at 1:34 PM

Pravesh Koirala

@pravesh.bsky.social

Whatever happened to NaNoWriMo?

November 14, 2024 at 1:59 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news