Lightnews — Scholar-powered news

Yapei Chang

@yapeichang.bsky.social

2.8K followers 650 following 34 posts

☁️ phd in progress @ UMD | 🔗 https://lilakk.github.io/

Posts Replies Media Videos

Yapei Chang

@yapeichang.bsky.social

Beyond benchmarks, human annotators rate BLEUBERI outputs as comparable to those from GRPO-RM models.

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

Qualitatively, BLEUBERI models produce more factually grounded outputs, as measured by VeriScore on three diverse datasets. VeriScore extracts verifiable claims from responses and checks each one against Google Search.

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

The surprising effectiveness of BLEU extends to training. BLEUBERI first selects 5K low-BLEU examples, then trains LLMs with GRPO using BLEU as the reward. BLEUBERI models are competitive as those trained with GRPO-RM (8B) and SFT across 4 benchmarks.

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

When BLEU agrees with humans on a pair of model outputs, what n-grams contribute to this decision? Below is an example where it captures both format (the “Ukrainian” and “English” headers) and factuality (the number 6.1).

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

BLEU is often dismissed for weak human correlation in generation tasks. But on general instruction following, using BLEU to rank pairs of Chatbot Arena outputs—scored against references from strong LLMs—matches 8B & 27B reward models in human agreement, especially with more refs.

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

BLEU is widely used for machine translation (MT) eval. Given a reference and a generation, it computes modified n-gram precision (1–4 grams) and applies a brevity penalty to penalize short outputs. If given multiple references, it takes the max match per n-gram.

May 20, 2025 at 4:25 PM

Yapei Chang

@yapeichang.bsky.social

🐠 what monday feels like..

😵 fish washed up on the shore of walden pond

December 2, 2024 at 11:46 PM

Yapei Chang

@yapeichang.bsky.social

i knew something like this had to exist but why did i only discover it now?? no more suffering from looking at my 10+ open arxiv tabs not knowing which one is which...

November 25, 2024 at 9:22 PM

Yapei Chang

@yapeichang.bsky.social

#EMNLP2024 was fun🍹now brainstorming ideas for #EMNLP2025 🙇🏻‍♀️

November 17, 2024 at 10:59 PM

Yapei Chang

@yapeichang.bsky.social

airbnb >>> hotel for conferences #EMNLP2024

November 17, 2024 at 1:28 AM

Yapei Chang

@yapeichang.bsky.social

🌊Heading to #EMNLP2024 tmr, presenting PostMark on Tue. morning! 🔗 arxiv.org/abs/2406.14517

Aside from this, I'd love to chat about:
• long-context training
• realistic & hard eval
• synthetic data
• tbh any cool projects people are working on

Also, I'm on the lookout for a summer 2025 internship!

November 10, 2024 at 7:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news