Lightnews — Scholar-powered news

Vishakh Padmakumar

@vishakhpk.bsky.social

PhD Student @nyudatascience.bsky.social, working with He He on NLP and Human-AI Collaboration.
Also hanging out @ai2.bsky.social
Website - https://vishakhpk.github.io/

Posts Replies Media Videos

Vishakh Padmakumar

@vishakhpk.bsky.social

Paper out now - arxiv.org/abs/2504.09389
And 🛠️ code to measure novelty and output logs including 📚 2,000+ generations with 📊 quality + originality scores coming soon!

Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models

As large language models (LLMs) are increasingly used for ideation and scientific discovery, it is important to evaluate their ability to generate novel output. Prior work evaluates novelty as the ori...

arxiv.org

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

And prompting tricks like asking for novelty and denial prompting trade-off originality and quality without meaningfully shifting the frontier of novelty …. so there’s lot more work to be done 😀

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

Sure, but can we elicit more novelty at inference time? Turns out it’s tricky. Increasing sampling temperatures (from 0.5 to 2) boosts originality but can hurt quality, creating a U-shaped effect.

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

But improving the underlying model can help yield more novel output! This can either be by (a) increasing model scale (1B -> 7B), and (b) instruction tuning (7B -> 7B-Instruct)

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

We find that base LLMs often generate less novel output than human-written references from the datasets

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

We evaluate the novelty of OLMo and Pythia models on 3 creative tasks:
📝 Story completion (TinyStories)
🎨 Poetry writing (Help Me Write a Poem)
🛠️ Creative tool use (MacGyver)
Novelty = harmonic mean of output quality (LLM-as-judge) and originality (unseen n-gram fraction).

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

Considering originality and quality separately is not enough—human prefs on quality can favor outputs reproducing training data (users may not recognize this) while originality alone can reward incoherent generations. These are often at odds & should be evaluated together💡

April 29, 2025 at 4:35 PM

Vishakh Padmakumar

@vishakhpk.bsky.social

Thank you @bwaber.bsky.social! Made my day 😁💯

January 31, 2025 at 2:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news