Lightnews — Scholar-powered news

Victor Wang

@victorwang37.bsky.social

24 followers 46 following 7 posts

Undergrad researcher at UT Austin, interested in NLP

https://dubai03nsr.github.io/

Posts Replies Media Videos

Victor Wang

@victorwang37.bsky.social

For listwise ranking, although the mean does not improve accuracy, it drastically improves calibration. We also find that accuracy is maximized by directly predicting the list without an intermediate pairwise step, further underscoring the limitations of CoT for judgment.

March 6, 2025 at 2:34 PM

Victor Wang

@victorwang37.bsky.social

For pairwise ranking, we compare pre- vs. post-aggregation (bottom vs. top in figure) of the two presentation orders' judgments. Small judges suffer from severe position bias, but pre-aggregation leverages the relative magnitudes of preference, boosting accuracy by 56.7→73.1.

March 6, 2025 at 2:33 PM

Victor Wang

@victorwang37.bsky.social

We further explore novel methods of comparing score distributions.

1. The mode loses even to other discrete methods such as the median or first percentile (and the gap grows when we use finer judgment granularity).

2. Incorporating risk aversion often improves performance.

March 6, 2025 at 2:33 PM

Victor Wang

@victorwang37.bsky.social

LLM judges have become ubiquitous, but valuable signal is often ignored at inference.

We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵

(w/ Michael J.Q. Zhang, @eunsol.bsky.social)

March 6, 2025 at 2:32 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news