Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL
Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL
Our foundation world model is capable of generating interactive worlds controllable with keyboard/mouse actions, starting from a single prompt image
So proud to have been part of this work led by @jparkerholder.bsky.social and @rockt.ai 🙏
Our foundation world model is capable of generating interactive worlds controllable with keyboard/mouse actions, starting from a single prompt image
So proud to have been part of this work led by @jparkerholder.bsky.social and @rockt.ai 🙏
LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using text as low-level actions in many domains using in-context expert (multimodal) demonstrations. We're excited to see how this benchmark drives further progress!
We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (arxiv.org/abs/2412.01441).
🧵 1/N
LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using text as low-level actions in many domains using in-context expert (multimodal) demonstrations. We're excited to see how this benchmark drives further progress!