Research @ Sony AI
AI should learn from its experiences, not copy your data.
My website for answering RL questions: https://www.decisionsanddragons.com/
Views and posts are my own.
Trying to force it to match AI's needs may be a mistake. AI should use another language (fingers crossed for Mojo).
www.odbms.org/blog/2025/10...
Trying to force it to match AI's needs may be a mistake. AI should use another language (fingers crossed for Mojo).
www.odbms.org/blog/2025/10...
In this post on Decisions & Dragons I answer "Should we abandon RL?"
The answer is obviously no, but people ask because they have a fundamental misunderstanding of what RL is.
RL is a problem, not an approach.
www.decisionsanddragons.com/posts/should...
In this post on Decisions & Dragons I answer "Should we abandon RL?"
The answer is obviously no, but people ask because they have a fundamental misunderstanding of what RL is.
RL is a problem, not an approach.
www.decisionsanddragons.com/posts/should...
My hope is that will help ground out intutitions about REINFORCE's stochastic gradients and how baselines help reduce their error.
See example in image.
My hope is that will help ground out intutitions about REINFORCE's stochastic gradients and how baselines help reduce their error.
See example in image.
This one answers "Why is it better to subtract a baseline in REINFORCE?"
Preview image attached. See the link for the full text.
1/2
www.decisionsanddragons.com/posts/why_is...
This one answers "Why is it better to subtract a baseline in REINFORCE?"
Preview image attached. See the link for the full text.
1/2
www.decisionsanddragons.com/posts/why_is...
At that point, it is required to style the output like Disco Elysium.
At that point, it is required to style the output like Disco Elysium.
In the first few slides of any intro on RL you will see this diagram. It's first because it captures the problem we are trying to solve.
In the first few slides of any intro on RL you will see this diagram. It's first because it captures the problem we are trying to solve.
It's described like AI is this independent entity that some day wakes up, rather than deliberate engineering process.
We don't need to "discover" things like this -- we know what we built and we chose to build it.
It's described like AI is this independent entity that some day wakes up, rather than deliberate engineering process.
We don't need to "discover" things like this -- we know what we built and we chose to build it.
Yes, that means we're not solving the AI problem "soon" since that's current thing we're good at.
Yes, the alternative is harder, but that's the challenge you adopt if you want to solve the AI problem.
Yes, that means we're not solving the AI problem "soon" since that's current thing we're good at.
Yes, the alternative is harder, but that's the challenge you adopt if you want to solve the AI problem.
Full answer: www.decisionsanddragons.com/posts/model_...
Full answer: www.decisionsanddragons.com/posts/model_...
Full answer: www.decisionsanddragons.com/posts/why_do...
Full answer: www.decisionsanddragons.com/posts/why_do...
Full answer: www.decisionsanddragons.com/posts/q_vs_v/
Full answer: www.decisionsanddragons.com/posts/q_vs_v/
Full answer: www.decisionsanddragons.com/posts/q_lear...
Full answer: www.decisionsanddragons.com/posts/q_lear...
Full answer: www.decisionsanddragons.com/posts/ddpg_g...
Full answer: www.decisionsanddragons.com/posts/ddpg_g...
Full answer: www.decisionsanddragons.com/posts/q_lear...
Full answer: www.decisionsanddragons.com/posts/q_lear...
Full answer: www.decisionsanddragons.com/posts/horizon/
Full answer: www.decisionsanddragons.com/posts/horizon/
Full answer: www.decisionsanddragons.com/posts/off_po...
Full answer: www.decisionsanddragons.com/posts/off_po...
It's named Decisions & Dragons. It’s launching with 8 questions and answers, but I will add to it in the future.
A 🧵 to give a preview with the link below.
It's named Decisions & Dragons. It’s launching with 8 questions and answers, but I will add to it in the future.
A 🧵 to give a preview with the link below.