Q-learning, SARSA, LLMs y agentes que aprenden a pensar 🤖✨
📺 Mira el video completo 👉 youtu.be/R6MvIB7DHLU
#InteligenciaArtificial #MachineLearning #DeepRL
Q-learning, SARSA, LLMs y agentes que aprenden a pensar 🤖✨
📺 Mira el video completo 👉 youtu.be/R6MvIB7DHLU
#InteligenciaArtificial #MachineLearning #DeepRL
bryantmcgill.blogspot.com/2025/07/rewa...
This investigation was inspired by Lex's (@LexFridman) @MIT 6.S091: Introduction to Deep RL.
Soundcloud:
soundcloud.com/bryantmcgill...
bryantmcgill.blogspot.com/2025/07/rewa...
This investigation was inspired by Lex's (@LexFridman) @MIT 6.S091: Introduction to Deep RL.
Soundcloud:
soundcloud.com/bryantmcgill...
We propose gradient interventions that enable stable, scalable learning, unlocking significant performance gains across agents and environments!
Details below 👇
Highly recommended read.
https://arxiv.org/abs/2407.00695
Highly recommended read.
https://arxiv.org/abs/2407.00695
Link: neurips2023-enlsp.github.io/papers/paper...
#ReinforcementLearning #FoundationModels #DeepRL #DeepReinforcementLearning #ResponsibleAI #AIBias #LLMs #LanguageModels
Link: neurips2023-enlsp.github.io/papers/paper...
#ReinforcementLearning #FoundationModels #DeepRL #DeepReinforcementLearning #ResponsibleAI #AIBias #LLMs #LanguageModels
From Thorndike’s cat puzzle box 🐱📦 to DeepMind’s AlphaGo 🤖🏆 to DeepSeek-R1 —how did RL become a key AI breakthrough?
📖 Read the full history:
👉 researchdatapod.com/history-rein...
#AI #ReinforcementLearning #DeepSeek #DeepRL #history
From Thorndike’s cat puzzle box 🐱📦 to DeepMind’s AlphaGo 🤖🏆 to DeepSeek-R1 —how did RL become a key AI breakthrough?
📖 Read the full history:
👉 researchdatapod.com/history-rein...
#AI #ReinforcementLearning #DeepSeek #DeepRL #history
Sergej Levine's DeepRL course is often recommended too. Maybe somewhere there's more updated videos, this is a mix of 4 years old and some videos are 1-2 years old:
Sergej Levine's DeepRL course is often recommended too. Maybe somewhere there's more updated videos, this is a mix of 4 years old and some videos are 1-2 years old: