Lightnews — Scholar-powered news

Pradeep Dasigi

@pdasigi.bsky.social

370 followers 82 following 13 posts

#NLP research @ai2.bsky.social; OLMo post-training
https://pdasigi.github.io/

Posts Replies Media Videos

Reposted by Pradeep Dasigi

Jacob Morrison

@jacobcares.bsky.social

also some other tülu contributors are on the market:
@ljvmiranda.bsky.social (ljvmiranda921.github.io) and Xinxi Lyu (alrope123.github.io) are also applying to phd programs, and @valentinapy.bsky.social (valentinapy.github.io) is on the faculty market, hire them all!!

January 30, 2025 at 7:25 PM

Pradeep Dasigi

@pdasigi.bsky.social

How does your cat feel about wearing a Garmin device and adding you as a connection?

November 27, 2024 at 6:15 PM

Pradeep Dasigi

@pdasigi.bsky.social

As a reviewer, I did not find it particularly useful. The recommendations I received were already addressed by my initial review. As an author, the mix of reviews we got were roughly the same quality as usual.
This is an interesting idea though, and I hope there's a way to make it work.

November 25, 2024 at 2:52 PM

Pradeep Dasigi

@pdasigi.bsky.social

We even developed two *new* instruction following evals for this setup:
1) IFEval-OOD, a variant of IFEval (Zhou et al., 2023) but with a disjoint set of constraints.
2) HREF, a more general IF eval targeting a diverse set of IF tasks.
Detailed analyses on these evals are coming out soon.

November 23, 2024 at 11:53 PM

Pradeep Dasigi

@pdasigi.bsky.social

We presented some preliminary findings on generalization and overfitting in the report based on this setup, and will put out more analysis soon.

November 23, 2024 at 11:53 PM

Pradeep Dasigi

@pdasigi.bsky.social

From a traditional ML perspective, this setup may seem obvious, i.e., a dev-test split of an eval. But none of our modern evals are in-distribution anymore, and we expect our models to generalize across distributions. So we adapted the traditional setup to use newer and harder evals as unseen ones.

November 23, 2024 at 11:53 PM

Pradeep Dasigi

@pdasigi.bsky.social

For each "core skill" we care about, we chose a separate set of "development" and "unseen" evaluations. We tracked the performance of models only on the former during development and evaluated only the final checkpoints on the unseen ones.

November 23, 2024 at 11:53 PM

Pradeep Dasigi

@pdasigi.bsky.social

Want to post-train on your own data? Here's our training code: github.com/allenai/open....

Reproducing LM evaluations can be notoriously difficult. So we released our evaluation framework where you can specify and tweak every last detail and reproduce what we did: github.com/allenai/olmes.

GitHub - allenai/open-instruct

Contribute to allenai/open-instruct development by creating an account on GitHub.

github.com

November 23, 2024 at 4:20 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news