Et le corrigé est là: perso.enst.fr/madore/mitro... ❦
Et le corrigé est là: perso.enst.fr/madore/mitro... ❦
- RL generalizes in rule-based envs, esp. when trained with an outcome-based reward
- SFT tends to memorize the training data and struggles to generalize OOD
- RL generalizes in rule-based envs, esp. when trained with an outcome-based reward
- SFT tends to memorize the training data and struggles to generalize OOD
A method to curate high quality data, or create high quality synthetic data. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9.
A method to curate high quality data, or create high quality synthetic data. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9.
ProLIP is the first from-scratch trained probabilistic vision-language model, which is comparable with CLIP or SigLIP
Paper: Probabilistic Language-Image Pre-Training ( arxiv.org/abs/2410.18857 )
Models: huggingface.co/collections/...
ProLIP is the first from-scratch trained probabilistic vision-language model, which is comparable with CLIP or SigLIP
Paper: Probabilistic Language-Image Pre-Training ( arxiv.org/abs/2410.18857 )
Models: huggingface.co/collections/...
He reviews all the key techniques that are used in building state-of-the-art video generation models.
yenchenlin.me/blog/2025/01...
He reviews all the key techniques that are used in building state-of-the-art video generation models.
yenchenlin.me/blog/2025/01...
arxiv.org/abs/2405.06161
arxiv.org/abs/2405.06161