Studying embodied intelligence via autonomous driving.
arxiv.org/abs/2511.07292
arxiv.org/abs/2511.07292
If you are looking for a PhD position in machine learning or robotics in Germany, this is the best program to apply to.
imprs.is.mpg.de/application
If you are looking for a PhD position in machine learning or robotics in Germany, this is the best program to apply to.
imprs.is.mpg.de/application
Blog post about determinism in LLMs.
They make a very interesting point at the end about how the numerical differences between data collection and training forward passes can make an RL algorithm lose its On-policy property.
Blog post about determinism in LLMs.
They make a very interesting point at the end about how the numerical differences between data collection and training forward passes can make an RL algorithm lose its On-policy property.
We show how simple rewards enable scaling up PPO for planning.
CaRL outperforms all prior learning-based approaches on nuPlan Val14 and CARLA longest6 v2, using less inference compute.
arxiv.org/abs/2504.17838
We show how simple rewards enable scaling up PPO for planning.
CaRL outperforms all prior learning-based approaches on nuPlan Val14 and CARLA longest6 v2, using less inference compute.
arxiv.org/abs/2504.17838
The key is that RL can optimize non-differentiable objectives, like human feedback!
We introduce RL from this alternative angle in our tutorial:
arxiv.org/abs/2312.08365
The key is that RL can optimize non-differentiable objectives, like human feedback!
We introduce RL from this alternative angle in our tutorial:
arxiv.org/abs/2312.08365
TF++ is also SOTA on Bench2Drive and Town 13 validation.
TF++ is also SOTA on Bench2Drive and Town 13 validation.