Vishakh Padmakumar
vishakhpk.bsky.social
Vishakh Padmakumar
@vishakhpk.bsky.social
PhD Student @nyudatascience.bsky.social, working with He He on NLP and Human-AI Collaboration.
Also hanging out @ai2.bsky.social
Website - https://vishakhpk.github.io/
Paper out now - arxiv.org/abs/2504.09389
And 🛠️ code to measure novelty and output logs including 📚 2,000+ generations with 📊 quality + originality scores coming soon!
Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models
As large language models (LLMs) are increasingly used for ideation and scientific discovery, it is important to evaluate their ability to generate novel output. Prior work evaluates novelty as the ori...
arxiv.org
April 29, 2025 at 4:35 PM
And prompting tricks like asking for novelty and denial prompting trade-off originality and quality without meaningfully shifting the frontier of novelty …. so there’s lot more work to be done 😀
April 29, 2025 at 4:35 PM
Sure, but can we elicit more novelty at inference time? Turns out it’s tricky. Increasing sampling temperatures (from 0.5 to 2) boosts originality but can hurt quality, creating a U-shaped effect.
April 29, 2025 at 4:35 PM
But improving the underlying model can help yield more novel output! This can either be by (a) increasing model scale (1B -> 7B), and (b) instruction tuning (7B -> 7B-Instruct)
April 29, 2025 at 4:35 PM
We find that base LLMs often generate less novel output than human-written references from the datasets
April 29, 2025 at 4:35 PM
We evaluate the novelty of OLMo and Pythia models on 3 creative tasks:
📝 Story completion (TinyStories)
🎨 Poetry writing (Help Me Write a Poem)
🛠️ Creative tool use (MacGyver)
Novelty = harmonic mean of output quality (LLM-as-judge) and originality (unseen n-gram fraction).
April 29, 2025 at 4:35 PM
Considering originality and quality separately is not enough—human prefs on quality can favor outputs reproducing training data (users may not recognize this) while originality alone can reward incoherent generations. These are often at odds & should be evaluated together💡
April 29, 2025 at 4:35 PM
Thank you @bwaber.bsky.social! Made my day 😁💯
January 31, 2025 at 2:00 PM