Sai Prasanna
@saiprasanna.in
See(k)ing the surreal
Causal World Models for Curious Robots @ University of Tübingen/Max Planck Institute for Intelligent Systems 🇩🇪
#reinforcementlearning #robotics #causality #meditation #vegan
Causal World Models for Curious Robots @ University of Tübingen/Max Planck Institute for Intelligent Systems 🇩🇪
#reinforcementlearning #robotics #causality #meditation #vegan
Pinned
Sai Prasanna
@saiprasanna.in
· Dec 4
📌 Thread of threads for research ideas 💡 Collaborations are most welcome 😁
Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.
September 10, 2025 at 9:27 AM
Use Beta NLL for regression when you also predict standard deviations, a simple change to NLL that works reliably better.
If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?
August 2, 2025 at 11:53 PM
If open-endedness has to be fundamentally subjectively measured, what are the factors of the agent makes it so if we fix humans as the final arbiter or evaluator. Does embodiment/action space etc of the agent matter for a human evaluator of open-endedness?
Tübingen: Freiburg:: Introvert:Extrovert
March 27, 2025 at 1:50 PM
Tübingen: Freiburg:: Introvert:Extrovert
Reposted by Sai Prasanna
This might be the most fun I’ve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.
open.substack.com/pub/contrapt...
open.substack.com/pub/contrapt...
Discworld Rules
And LOTR is brain-rot for technologists
open.substack.com
March 8, 2025 at 2:53 AM
This might be the most fun I’ve had writing an essay in a while. Felt some of that old going-nuts-with-an-idea energy flowing.
open.substack.com/pub/contrapt...
open.substack.com/pub/contrapt...
Reposted by Sai Prasanna
This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around ...
arxiv.org
March 2, 2025 at 4:19 PM
This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"
March 1, 2025 at 10:02 PM
I realized how I background process tonnes of information, from work/research and emotional stuff. And it works well, leads to good research ideas, wise processing of tough situations! But It's so hard to learn to trust this as conscious thinking for solving problems feels more under my "control"
TIL: "Clever Hans cheat" for next-token prediction. A subtle but interesting issue with next-token prediction. In the purely forward next token prediction objective, teacher forcing can lead to learning dynamics where the models don't even generalize "in-distribution"!!
arxiv.org/abs/2403.06963
arxiv.org/abs/2403.06963
The pitfalls of next-token prediction
Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token objective...
arxiv.org
March 1, 2025 at 9:29 PM
TIL: "Clever Hans cheat" for next-token prediction. A subtle but interesting issue with next-token prediction. In the purely forward next token prediction objective, teacher forcing can lead to learning dynamics where the models don't even generalize "in-distribution"!!
arxiv.org/abs/2403.06963
arxiv.org/abs/2403.06963
Break the Monday Productivity ceiling with this super awesome 4 hour techno set
on.soundcloud.com/hXTcWTTsYUNK...
on.soundcloud.com/hXTcWTTsYUNK...
Yetti Meissner @ Sisyphos Hammerhalle 09/08/14
🖤
BOOKING CONTACT
chris@stilvortalent.de
on.soundcloud.com
January 27, 2025 at 1:56 PM
Break the Monday Productivity ceiling with this super awesome 4 hour techno set
on.soundcloud.com/hXTcWTTsYUNK...
on.soundcloud.com/hXTcWTTsYUNK...
Monday kick starter open.spotify.com/track/6QXjBA...
Enimatek
Kore-G · Enimatek · Song · 2023
open.spotify.com
January 20, 2025 at 10:56 AM
Monday kick starter open.spotify.com/track/6QXjBA...
If I have a really good photo that could be potentially used in many contexts, what's the best place to make money with it? My friend has a really good eye for photos and we want to try a side venture selling some of her stuff
December 30, 2024 at 3:19 PM
If I have a really good photo that could be potentially used in many contexts, what's the best place to make money with it? My friend has a really good eye for photos and we want to try a side venture selling some of her stuff
Reposted by Sai Prasanna
RIP Manmohan Singh. Dude changed all our lives in 1991 for the better. His stint as turnaround finance minister was revolutionary even if his later stint as PM was rather hapless (for which Nehru dynasty is more to blame).
Manmohan Singh - Wikipedia
en.wikipedia.org
December 27, 2024 at 3:44 AM
RIP Manmohan Singh. Dude changed all our lives in 1991 for the better. His stint as turnaround finance minister was revolutionary even if his later stint as PM was rather hapless (for which Nehru dynasty is more to blame).
Reposted by Sai Prasanna
Looks like a cool study. Lots to learn from ants about large scale coordination
www.pnas.org/doi/10.1073/...
"Our results exemplify how simple minds can easily enjoy scalability while complex brains require extensive communication to cooperate efficiently."
h/t @petersuber.bsky.social
www.pnas.org/doi/10.1073/...
"Our results exemplify how simple minds can easily enjoy scalability while complex brains require extensive communication to cooperate efficiently."
h/t @petersuber.bsky.social
Comparing cooperative geometric puzzle solving in ants versus humans | PNAS
Biological ensembles use collective intelligence to tackle challenges together, but
suboptimal coordination can undermine the effectiveness of grou...
www.pnas.org
December 25, 2024 at 9:57 PM
Looks like a cool study. Lots to learn from ants about large scale coordination
www.pnas.org/doi/10.1073/...
"Our results exemplify how simple minds can easily enjoy scalability while complex brains require extensive communication to cooperate efficiently."
h/t @petersuber.bsky.social
www.pnas.org/doi/10.1073/...
"Our results exemplify how simple minds can easily enjoy scalability while complex brains require extensive communication to cooperate efficiently."
h/t @petersuber.bsky.social
Doing The Beeston Bump
Leafcutter John · Yes! Come Parade With Us · Song · 2019
open.spotify.com
December 18, 2024 at 5:54 PM
Does augmenting ourselves with V/LLMs to cognitive gaps make self actualization even more difficult on average?
Stands stark in contrast with (more difficult/slower to show positive outcomr) augmentation strategies like meditation or psychedelics
Stands stark in contrast with (more difficult/slower to show positive outcomr) augmentation strategies like meditation or psychedelics
December 18, 2024 at 10:40 AM
Does augmenting ourselves with V/LLMs to cognitive gaps make self actualization even more difficult on average?
Stands stark in contrast with (more difficult/slower to show positive outcomr) augmentation strategies like meditation or psychedelics
Stands stark in contrast with (more difficult/slower to show positive outcomr) augmentation strategies like meditation or psychedelics
Reposted by Sai Prasanna
The slides for my lectures on (Bayesian) Active Learning, Information Theory, and Uncertainty are online now 🥳 They cover quite a bit from basic information theory to some recent papers:
blackhc.github.io/balitu/
and I'll try to add proper course notes over time 🤗
blackhc.github.io/balitu/
and I'll try to add proper course notes over time 🤗
December 17, 2024 at 6:50 AM
The slides for my lectures on (Bayesian) Active Learning, Information Theory, and Uncertainty are online now 🥳 They cover quite a bit from basic information theory to some recent papers:
blackhc.github.io/balitu/
and I'll try to add proper course notes over time 🤗
blackhc.github.io/balitu/
and I'll try to add proper course notes over time 🤗
Manifold Garden - Launch Trailer | PS4
YouTube video by PlayStation
www.youtube.com
December 16, 2024 at 4:26 PM
Reposted by Sai Prasanna
Check out Motivo, a behavioral foundation model for humanoid control by FAIR.
It's a one-of-its-kind unsupervised RL project, and it comes with a demo that is SO fun to play with!
metamotivo.metademolab.com
(for the record, they use compile and cudagraphs -> github.com/facebookrese...)
It's a one-of-its-kind unsupervised RL project, and it comes with a demo that is SO fun to play with!
metamotivo.metademolab.com
(for the record, they use compile and cudagraphs -> github.com/facebookrese...)
December 14, 2024 at 12:44 AM
Check out Motivo, a behavioral foundation model for humanoid control by FAIR.
It's a one-of-its-kind unsupervised RL project, and it comes with a demo that is SO fun to play with!
metamotivo.metademolab.com
(for the record, they use compile and cudagraphs -> github.com/facebookrese...)
It's a one-of-its-kind unsupervised RL project, and it comes with a demo that is SO fun to play with!
metamotivo.metademolab.com
(for the record, they use compile and cudagraphs -> github.com/facebookrese...)
Reposted by Sai Prasanna
Modern life is a Turing tarpit: “Everything is possible, but nothing is easy”
By contrast any traditional lifestyle is sub-Turing
All the people pining for rituals, steady routines, deep work etc etc etc… YOU CAN’T HANDLE THE TURING COMPLETENESS
en.wikipedia.org/wiki/Turing_...
By contrast any traditional lifestyle is sub-Turing
All the people pining for rituals, steady routines, deep work etc etc etc… YOU CAN’T HANDLE THE TURING COMPLETENESS
en.wikipedia.org/wiki/Turing_...
Turing tarpit - Wikipedia
en.wikipedia.org
December 13, 2024 at 2:13 AM
Modern life is a Turing tarpit: “Everything is possible, but nothing is easy”
By contrast any traditional lifestyle is sub-Turing
All the people pining for rituals, steady routines, deep work etc etc etc… YOU CAN’T HANDLE THE TURING COMPLETENESS
en.wikipedia.org/wiki/Turing_...
By contrast any traditional lifestyle is sub-Turing
All the people pining for rituals, steady routines, deep work etc etc etc… YOU CAN’T HANDLE THE TURING COMPLETENESS
en.wikipedia.org/wiki/Turing_...
Reposted by Sai Prasanna
If you're at NeurIPS, RLC is hosting an RL event from 8 till late at The Pearl on Dec. 11th. Join us, meet all the RL researchers, and spread the word!
December 10, 2024 at 9:55 PM
If you're at NeurIPS, RLC is hosting an RL event from 8 till late at The Pearl on Dec. 11th. Join us, meet all the RL researchers, and spread the word!
One thing coding with LLMs has helped me a lot during the past months is for visualisations. I'm churning out code to visualize many aspects of agent behavior which I wouldn't have done before due to my mental friction in writing such code.
Such code to do visualisations is also easy to verify.
Such code to do visualisations is also easy to verify.
December 11, 2024 at 5:59 PM
One thing coding with LLMs has helped me a lot during the past months is for visualisations. I'm churning out code to visualize many aspects of agent behavior which I wouldn't have done before due to my mental friction in writing such code.
Such code to do visualisations is also easy to verify.
Such code to do visualisations is also easy to verify.
When predicting discrete joint distribution of two variables with a neural network, what loss is the best to use? KL on the joint and two marginals? Or is there anything better?
December 11, 2024 at 5:37 PM
When predicting discrete joint distribution of two variables with a neural network, what loss is the best to use? KL on the joint and two marginals? Or is there anything better?
Reposted by Sai Prasanna
In an effort to play a small part in creating additional value on this site, I'm going to post one-per-day a paper we wrote that was published in 2024. Together with memes. Skipping holidays/weekends. In random order.
Would love your thoughts on them.
I'll keep them threaded for easy finding!
>
Would love your thoughts on them.
I'll keep them threaded for easy finding!
>
November 25, 2024 at 9:34 AM
In an effort to play a small part in creating additional value on this site, I'm going to post one-per-day a paper we wrote that was published in 2024. Together with memes. Skipping holidays/weekends. In random order.
Would love your thoughts on them.
I'll keep them threaded for easy finding!
>
Would love your thoughts on them.
I'll keep them threaded for easy finding!
>
Reposted by Sai Prasanna
The RL book by Kevin Murphy is finally online (copied shamelessly from the other place) arxiv.org/abs/2412.05265
Reinforcement Learning: An Overview
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based met...
arxiv.org
December 9, 2024 at 6:25 AM
The RL book by Kevin Murphy is finally online (copied shamelessly from the other place) arxiv.org/abs/2412.05265