. CIFAR AI Chair, RL_Conference chair. Creating generalist problem-solving agents for the real world. He/him/il.
Applicant names, profiles, demographics
Reviewers names, profiles, comments, and scores
Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE).
1⃣ Let's advance science of RL
2⃣ Let's be explicit about how benchmarks map to formalism
1/X
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
cloud.google.com/blog/product...
ivado.ca/en/events/bo...
ivado.ca/en/events/bo...
rl-conference.cc/call_for_soc...
We propose gradient interventions that enable stable, scalable learning, unlocking significant performance gains across agents and environments!
Details below 👇
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6
We propose BYOL-γ: an auxiliary self-predictive loss to improve generalization for goal-conditioned BC. 🧵1/6