Robotics. Reinforcement learning. AI.
For the full paper, see: pi.website/download/fas...
For more Pi research, see:
For the full paper, see: pi.website/download/fas...
For more Pi research, see:
We are releasing the FAST tokenizer that we pre-trained on 1M robot action sequences. In our experiments it works well for tokenizing actions from many different kinds of robots. And it’s easy to use!
cdn.bsky.app/img/feed_thu...
We are releasing the FAST tokenizer that we pre-trained on 1M robot action sequences. In our experiments it works well for tokenizing actions from many different kinds of robots. And it’s easy to use!
cdn.bsky.app/img/feed_thu...
Blog (and paper + code): pi.website/research/fast
Blog (and paper + code): pi.website/research/fast
For more, see the website: generalist-distillation.github.io
w/
@CharlesXu0124
@qiyang_li
@jianlanluo
For more, see the website: generalist-distillation.github.io
w/
@CharlesXu0124
@qiyang_li
@jianlanluo
pbs.twimg.com/media/GeisF4...
pbs.twimg.com/media/GeisF4...
w/ zhouzypaul.github.io
, Andy Peng, @qcli.bsky.social
, aviralkumar2907.github.io
w/ zhouzypaul.github.io
, Andy Peng, @qcli.bsky.social
, aviralkumar2907.github.io
For more, check out the paper here: arxiv.org/abs/2411.05193
For more, check out the paper here: arxiv.org/abs/2411.05193
The equations look a bit more complicated than it really is, this is the method:
The equations look a bit more complicated than it really is, this is the method:
This means that greedily decoding actually leads to the greedy Q-value maximizing policy.
This means that greedily decoding actually leads to the greedy Q-value maximizing policy.