Lightnews — Scholar-powered news

Yuda Song

@yus167.bsky.social

1.3K followers 190 following 12 posts

PhD at Machine Learning Department, Carnegie Mellon University | Interactive Decision Making | https://yudasong.github.io

Posts Replies Media Videos

Yuda Song

@yus167.bsky.social

We also dive deep into the similarity and difference between different verification mechanisms. We observed the consistency, distinction and ensemble properties of the verification methods (see the summary image). (8/9)

December 6, 2024 at 6:02 PM

Yuda Song

@yus167.bsky.social

In iterative self-improvement, we observe the gap diminishes to 0 in a few iterations, resembling many previous findings. We discovered that one cause of such saturation is the degradation of the "effective diversity" of the generation due to the imperfect verifier. (7/9)

December 6, 2024 at 6:02 PM

Yuda Song

@yus167.bsky.social

However, self-improvement is not always possible on all tasks. We do not observe significant self-improvement signal on QA tasks like Natural Questions. Also, not all models can self-improve on sudoku, a canonical example of "verification is easier than generation". (6/9)

December 6, 2024 at 6:02 PM

Yuda Song

@yus167.bsky.social

Our first major result is an observational scaling law: with certain verification methods, the relative gap increases monotonically (almost linear) to the log of pretrain flops, on tasks like GSM8K and MATH. (5/9)

December 6, 2024 at 6:02 PM

Yuda Song

@yus167.bsky.social

LLM self-improvement has critical implications in synthetic data, post-training and test-time inference. To understand LLMs' true capability of self-improvement, we perform large-scale experiments with multiple families of LLMs, tasks and mechanisms. Here is what we found: (1/9)

December 6, 2024 at 6:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news