Tokyo based.
We used an LLM to generate reward functions for RL trained racing agents in Gran Turismo and an VLM to evaluate their performance.
See, now I'm a cool LLM researcher :)
#LLM #RL #GranTurismo
We used an LLM to generate reward functions for RL trained racing agents in Gran Turismo and an VLM to evaluate their performance.
See, now I'm a cool LLM researcher :)
#LLM #RL #GranTurismo
arxiv.org/html/2510.04...
arxiv.org/html/2510.04...
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult.
... so if a human isn't a well-educated adult they don't have general intelligence?
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult.
... so if a human isn't a well-educated adult they don't have general intelligence?
Huge congratulations, Hon Tik (Rick) Tse and Siddarth Chandrasekar.
My PhD student, Hon Tik Tse, led this work, and my MSc student, Siddarth Chandrasekar, assisted us.
arxiv.org/abs/2505.16217
Basically, it's the SR with rewards. See below 👇
Huge congratulations, Hon Tik (Rick) Tse and Siddarth Chandrasekar.
job-boards.greenhouse.io/deepmind/job...
job-boards.greenhouse.io/deepmind/job...
sonyglobal.wd1.myworkdayjobs.com/en-US/SonyGl...
sonyglobal.wd1.myworkdayjobs.com/en-US/SonyGl...
We do awesome RL applications to modern video games. If that excites you, check out the posting!
We have positions for senior and staff level developers. Senior dev link (staff level link next in 🧵):
sonyglobal.wd1.myworkdayjobs.com/en-US/SonyGl...
We do awesome RL applications to modern video games. If that excites you, check out the posting!
We have positions for senior and staff level developers. Senior dev link (staff level link next in 🧵):
sonyglobal.wd1.myworkdayjobs.com/en-US/SonyGl...
It is remote for people in the US & Canada, mixed remote/onsite in Europe (onsite in Zurich), and onsite in Tokyo.
If you want to work on RL with cool applications, sign up!
ai.sony/joinus/job-r...
It is remote for people in the US & Canada, mixed remote/onsite in Europe (onsite in Zurich), and onsite in Tokyo.
If you want to work on RL with cool applications, sign up!
ai.sony/joinus/job-r...
#RL #GranTurismo #GTSophy
#RL #GranTurismo #GTSophy
LLM finetuning is done ONLY using internal reward (model confidence) with no external grounding reward.
That means the LLM had to already know how to solve the problems.
LLM finetuning is done ONLY using internal reward (model confidence) with no external grounding reward.
That means the LLM had to already know how to solve the problems.
Apparently my resting face is one of: disgust, anger, sad and my happy face is contempt :P
Apparently my resting face is one of: disgust, anger, sad and my happy face is contempt :P
corticallabs.com
corticallabs.com
www.nytimes.com/2025/03/05/t...
www.nytimes.com/2025/03/05/t...
Really cool approach: agentic LLM generates its own actions as python functions. However, this is NOT doing RL - there is no reward over which the system optimizes. #rl #llm
Really cool approach: agentic LLM generates its own actions as python functions. However, this is NOT doing RL - there is no reward over which the system optimizes. #rl #llm