Lightnews — Scholar-powered news

James MacGlashan

@jmac-ai.bsky.social

2.2K followers 1.1K following 660 posts

Ask me about Reinforcement Learning
Research @ Sony AI
AI should learn from its experiences, not copy your data.

My website for answering RL questions: https://www.decisionsanddragons.com/

Views and posts are my own.

Posts Replies Media Videos

James MacGlashan

@jmac-ai.bsky.social

Guido's opinion that removing the GIL isn't good for Python on the whole is consistent with my opinion that Python is good, but a bad fit for AI.

Trying to force it to match AI's needs may be a mistake. AI should use another language (fingers crossed for Mojo).

www.odbms.org/blog/2025/10...

Screenshot of article. Guido says " I honestly think the importance of the GIL removal project has been overstated. It serves the needs of the largest users (e.g. Meta) while complicating things for potential contributors to the CPython code base (proving that new code does not introduce concurrency bugs is hard). And we see regularly questions from people who try to parallelize their code and get a slowdown — which makes me think that the programming model is not generally well understood. So I worry that Python’s getting too corporate, because the big corporate users can pay for new features only they need (to be clear, they don’t give us money to implement their features, but they give us developers, which comes down to the same thing)."

October 12, 2025 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

This one's been a long time coming.

In this post on Decisions & Dragons I answer "Should we abandon RL?"

The answer is obviously no, but people ask because they have a fundamental misunderstanding of what RL is.

RL is a problem, not an approach.

www.decisionsanddragons.com/posts/should...

August 15, 2025 at 11:30 PM

James MacGlashan

@jmac-ai.bsky.social

Something I did a little differently for this answer is write some JAX code to compute various gradients for an example toy problem.

My hope is that will help ground out intutitions about REINFORCE's stochastic gradients and how baselines help reduce their error.

See example in image.

Preview image of the website showing JAX code for compute every stochastic gradient REINFORCE could estimate for each action. See the website page linked in the first post for the full text.

May 9, 2025 at 1:41 PM

James MacGlashan

@jmac-ai.bsky.social

It's been a minute, but I've added a new Q&A to Decisions and Dragons!

This one answers "Why is it better to subtract a baseline in REINFORCE?"

Preview image attached. See the link for the full text.

1/2

www.decisionsanddragons.com/posts/why_is...

A screen cap of the web page, showing part of the answer and the TOC for the answer. If you follow the link, you can read the full text.

May 9, 2025 at 1:41 PM

James MacGlashan

@jmac-ai.bsky.social

Given the success of "thinking" in LLMs by using special sections for the thought vs response, it's inevitable that the next version is sections with reactions from different egos before making the response.

At that point, it is required to style the output like Disco Elysium.

April 19, 2025 at 2:33 PM

James MacGlashan

@jmac-ai.bsky.social

Scene from Spider-man when Aunt May tells Peter "You're not superman, you know." Instead, Peter is labeled as ChatGPT and May is telling him "You're not multivac, you know"

January 28, 2025 at 2:46 PM

James MacGlashan

@jmac-ai.bsky.social

This is also working for me. So I tried 26 and it failed.

January 17, 2025 at 6:35 PM

James MacGlashan

@jmac-ai.bsky.social

I think you're using a more narrow definition of RL than the RL community. Those approaches fit comfortably in it. The DT work even calls itself RL.

In the first few slides of any intro on RL you will see this diagram. It's first because it captures the problem we are trying to solve.

December 31, 2024 at 11:15 PM

James MacGlashan

@jmac-ai.bsky.social

Counter point.

Leap of faith scene from the movie Into the Spiderverse.

Miles is falling down head first, but the image is flipped vertically to match his perspective of "falling" up into the city.

December 19, 2024 at 4:29 PM

James MacGlashan

@jmac-ai.bsky.social

I'm referring to comments like the attached one.

It's described like AI is this independent entity that some day wakes up, rather than deliberate engineering process.

We don't need to "discover" things like this -- we know what we built and we chose to build it.

December 11, 2024 at 5:56 PM

James MacGlashan

@jmac-ai.bsky.social

Here's a suggestion: stop trying to solve the AI problem solely by fitting curated datasets.

Yes, that means we're not solving the AI problem "soon" since that's current thing we're good at.

Yes, the alternative is harder, but that's the challenge you adopt if you want to solve the AI problem.

Screen grab from Business Insider saying: "A Google spokesperson told Bloomberg that the tech giant is rethinking how it approaches training data."

December 1, 2024 at 5:22 PM

James MacGlashan

@jmac-ai.bsky.social

Q: What is the difference between model-based and model-free RL?

Full answer: www.decisionsanddragons.com/posts/model_...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: Why does the policy gradient include a log probability term?

Full answer: www.decisionsanddragons.com/posts/why_do...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: What is the difference between V(s) and Q(s,a)?

Full answer: www.decisionsanddragons.com/posts/q_vs_v/

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: If Q-learning is off-policy, why doesn't it require importance sampling?

Full answer: www.decisionsanddragons.com/posts/q_lear...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: Why is the DDPG gradient the product of the Q-function gradient and policy gradient?

Full answer: www.decisionsanddragons.com/posts/ddpg_g...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: Why doesn't Q-learning work with continuous actions?

Full answer: www.decisionsanddragons.com/posts/q_lear...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: What is the "horizon" in reinforcement learning?

Full answer: www.decisionsanddragons.com/posts/horizon/

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Q: Why does experience replay require off-policy learning and how is it different from on-policy learning?

Full answer: www.decisionsanddragons.com/posts/off_po...

November 10, 2024 at 4:11 PM

James MacGlashan

@jmac-ai.bsky.social

Reinforcement learning in #AI is hard, so I’ve made a website to collect answers I’ve given to common RL questions.

It's named Decisions & Dragons. It’s launching with 8 questions and answers, but I will add to it in the future.

A 🧵 to give a preview with the link below.

November 10, 2024 at 4:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news