yinglunz.com
TTM enables SigLIP-B16 (~0.2B params) to outperform GPT-4.1 on MMVP-VLM, establishing a new SOTA.
TTM enables SigLIP-B16 (~0.2B params) to outperform GPT-4.1 on MMVP-VLM, establishing a new SOTA.
TTM enables SigLIP-B16 (~0.2B params) to outperform GPT-4.1 on MMVP-VLM, establishing a new SOTA.
Online Finetuning Decision Transformers with Pure RL Gradients
RL drives reasoning in LLMs—but remains underexplored for online finetuning of Decision Transformers (DTs), where most methods still rely mainly on supervised objectives.
Why?
Online Finetuning Decision Transformers with Pure RL Gradients
RL drives reasoning in LLMs—but remains underexplored for online finetuning of Decision Transformers (DTs), where most methods still rely mainly on supervised objectives.
Why?
We extend classical unimodal active learning to the multimodal AL with unaligned data, allowing data-efficient finetuning and pretraining of vision-language models such as CLIP and SigLIP.
1/3
We extend classical unimodal active learning to the multimodal AL with unaligned data, allowing data-efficient finetuning and pretraining of vision-language models such as CLIP and SigLIP.
1/3
We turn test-time compute allocation into a bandit learning problem, achieving:
✅ +11.10% on MATH-500
✅ +7.41% on LiveCodeBench
Paper: arxiv.org/pdf/2506.12721
(1/3)
We turn test-time compute allocation into a bandit learning problem, achieving:
✅ +11.10% on MATH-500
✅ +7.41% on LiveCodeBench
Paper: arxiv.org/pdf/2506.12721
(1/3)
Please visit yinglunz.com for detailed information on research directions and contact instructions.
Please visit yinglunz.com for detailed information on research directions and contact instructions.