Lightnews — Scholar-powered news

Joe Stacey

@joestacey.bsky.social

This was just embarrassing. Shame on everyone who works on Grok…

November 15, 2025 at 11:10 AM

Joe Stacey

@joestacey.bsky.social

Congratulations!! Awesome you will be in Europe!

July 22, 2025 at 7:49 PM

Joe Stacey

@joestacey.bsky.social

The bad:

- the chocolate here is terrible for no good reason
- hotel breakfasts never have any baked beans, which are way under appreciated here (they are delicious and add much needed moisture to a cooked breakfast)
- the temperature in summer is inhumane

Think that covers the main stuff 😍

July 17, 2025 at 11:24 AM

Joe Stacey

@joestacey.bsky.social

This work was really fun and a great last paper for my PhD. Check it out 🙂 Massive thanks to all my amazing collaborators!

arxiv.org/abs/2505.20209

P.S. if you know about a paper improving NLI model robustness not already in our related work appendix, I would love to hear about it 🥰

How to Improve the Robustness of Closed-Source Models on NLI

Closed-source Large Language Models (LLMs) have become increasingly popular, with impressive performance across a wide range of natural language tasks. These models can be fine-tuned to further improv...

arxiv.org

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

5) The best way to improve performance on the hardest OOD data was to choose more challenging training examples

Our best method (Uncertainty Sampling) picked examples with the most uncertain predictions. This identified challenging examples, but without too much label noise

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

4) Creating more complex synthetic data avoids a loss in performance on harder OOD datasets

We find that generating more challenging synthetic data (Long & Complex Generation) helps retain performance on harder OOD datasets, while still achieving gains on easier OOD data

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

3) Replacing some training examples with LLM-generated data proved very effective on less challenging OOD data

See Standard-OOD scores below (avg), where the simplest LLM-generated data (Short & Simple Generation) performed best, with substantial improvements

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

2) We experiment with 6+ ways for improving robustness:

This involved sampling methods to choose more complex examples in our training data, and generating new synthetic examples

Some methods were pretty fun, e.g. asking an LLM to assess the difficulty of training examples

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

1) It's time to stop using fine-tuned encoder models:

We find that fine-tuned LLMs are substantially more robust than commonly used encoder models, despite being fine-tuned on x50 less data.

This is especially the case on challenging OOD datasets (see Challenge-OOD avg below)

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

The paper tries to improve the robustness of closed-source LLMs fine-tuned on NLI, assuming a realistic training budget of 10k training examples.

Here's a 45 second rundown of what we found!

May 27, 2025 at 3:50 PM

Joe Stacey

@joestacey.bsky.social

I’d personally just love to see more negative results from nice ideas that didn’t quite work out. I feel like there’s probably a bunch of cool stuff people have tried out and discarded that could be made to work across multiple papers. Would be fun and interesting too

May 18, 2025 at 3:48 PM

Joe Stacey

@joestacey.bsky.social

Was worried it was just me hating on it so much 🤣

May 18, 2025 at 11:01 AM

Joe Stacey

@joestacey.bsky.social

I’d love to see more diversity in the field, what kind of things were you thinking?

May 18, 2025 at 9:06 AM

Joe Stacey

@joestacey.bsky.social

Looks so cool! I’m insanely jealous

April 28, 2025 at 5:14 PM

Joe Stacey

@joestacey.bsky.social

I’m not a fan of musk, but imo there’s some really nice work here 🙂

Interested in the Washington post article, would you mind sharing a link?

April 23, 2025 at 6:01 AM

Joe Stacey

@joestacey.bsky.social

That’s an awesome paper 👍👍

April 14, 2025 at 5:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news