Yapei Chang
banner
yapeichang.bsky.social
Yapei Chang
@yapeichang.bsky.social
☁️ phd in progress @ UMD | 🔗 https://lilakk.github.io/
Beyond benchmarks, human annotators rate BLEUBERI outputs as comparable to those from GRPO-RM models.
May 20, 2025 at 4:25 PM
Qualitatively, BLEUBERI models produce more factually grounded outputs, as measured by VeriScore on three diverse datasets. VeriScore extracts verifiable claims from responses and checks each one against Google Search.
May 20, 2025 at 4:25 PM
The surprising effectiveness of BLEU extends to training. BLEUBERI first selects 5K low-BLEU examples, then trains LLMs with GRPO using BLEU as the reward. BLEUBERI models are competitive as those trained with GRPO-RM (8B) and SFT across 4 benchmarks.
May 20, 2025 at 4:25 PM
When BLEU agrees with humans on a pair of model outputs, what n-grams contribute to this decision? Below is an example where it captures both format (the “Ukrainian” and “English” headers) and factuality (the number 6.1).
May 20, 2025 at 4:25 PM
BLEU is often dismissed for weak human correlation in generation tasks. But on general instruction following, using BLEU to rank pairs of Chatbot Arena outputs—scored against references from strong LLMs—matches 8B & 27B reward models in human agreement, especially with more refs.
May 20, 2025 at 4:25 PM
BLEU is widely used for machine translation (MT) eval. Given a reference and a generation, it computes modified n-gram precision (1–4 grams) and applies a brevity penalty to penalize short outputs. If given multiple references, it takes the max match per n-gram.
May 20, 2025 at 4:25 PM
🐠 what monday feels like..
December 2, 2024 at 11:46 PM
i knew something like this had to exist but why did i only discover it now?? no more suffering from looking at my 10+ open arxiv tabs not knowing which one is which...
November 25, 2024 at 9:22 PM
#EMNLP2024 was fun🍹now brainstorming ideas for #EMNLP2025 🙇🏻‍♀️
November 17, 2024 at 10:59 PM
airbnb >>> hotel for conferences #EMNLP2024
November 17, 2024 at 1:28 AM
🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!
November 10, 2024 at 7:35 PM