haoyuhe.bsky.social
@haoyuhe.bsky.social
PhD student @ AVG, University of Tübingen
🔹 Running Confidence Remasking (RCR) – a training-free decoding strategy that allows revising low-confidence tokens flexibly.
August 20, 2025 at 9:40 AM
🔹 Masked Diffusion Policy Optimization – the first policy gradient method that optimizes MDLMs as a sequential decision-making process. MDPO exploits the nice property that MDLMs yield text completions at every inference step and optimizes the model with with intermediate-step rewards.
August 20, 2025 at 9:40 AM