Lightnews — Scholar-powered news

Daniel Khashabi

@danielkhashabi.bsky.social

For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to 𝐡𝐮𝐦𝐚𝐧 𝐥𝐚𝐧𝐠𝐮𝐚𝐠𝐞. But … is it?

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Big congrats to @jackjingyuzhang for being named an Amazon AI PhD Fellow! 🎉 Grateful for @AmazonScience @RohitPrasadAI’s support as we work together to advance AI research at JHU.
x.com/jackjingyuz...

October 24, 2025 at 4:08 PM

Daniel Khashabi

@danielkhashabi.bsky.social

ICL and SFT are the two most studied ways to adapt LMs. We understand each in isolation — but far less about how they might 𝗰𝗼𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁 𝗼𝗻𝗲 𝗮𝗻𝗼𝘁𝗵𝗲𝗿.

October 3, 2025 at 2:23 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Imagine this: excited about the recent progress, you’ve built an agentic system that uses 🔧tools (API calls) to solve complex problems. What could go wrong?

We studied agentic tool recovery—when your LLM selects a set of tools to execute, but one turns out to be unavailable or incorrect.

September 19, 2025 at 2:29 PM

Daniel Khashabi

@danielkhashabi.bsky.social

A core hurdles in AI safety eval is that benchmarks (e.g., those on jailbreak attacks) quickly become outdated shortly after they are released (e.g., saturate, contaminate, patched).

August 26, 2025 at 9:15 PM

Daniel Khashabi

@danielkhashabi.bsky.social

A core hurdles in AI safety eval is that benchmarks (e.g., those on jailbreak attacks) quickly become outdated shortly after they are released (e.g., saturate, contaminate, patched).

August 26, 2025 at 2:59 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Excited to collaborate up with LMArena, NIH, and DataTecnica to launch BiomedArena! Our goal is to advance the use of LLMs in biomedical discovery and incorporate community-driven insights
to help shape the future of biomedical AI.

⚔️ Check it out: biomedarena.ai

August 19, 2025 at 8:33 PM

Daniel Khashabi

@danielkhashabi.bsky.social

What’s really going on inside LLMs when they handle non-English queries?

Niyati Bafna @niyatibafna.bsky.social 's recent work introduces the **translation barrier hypothesis**, a framework for understanding multilingual model behavior.

Paper: huggingface.co/papers/2506...

Paper page - The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure

huggingface.co

July 7, 2025 at 12:15 PM

Reposted by Daniel Khashabi

Niyati Bafna

@niyatibafna.bsky.social

🔈When LLMs solve tasks with a mid-to-low resource input or target language, their output quality is poor. We know that. But can we put our finger on what breaks inside the LLM? We introduce the 💥 translation barrier hypothesis 💥 for failed multilingual generation with LLMs. arxiv.org/abs/2506.22724

July 4, 2025 at 5:05 PM

Daniel Khashabi

@danielkhashabi.bsky.social

🚨New LLM benchmark🚨 We're releasing BiomedSQL🔬 for tabular reasoning over large-scale biomedical databases. This includes questions based on implicit scientific conventions—like statistical thresholds, effect direction, and drug approval status.

📄 Preprint: arxiv.org/pdf/2505.20321

May 29, 2025 at 12:10 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Long-form inputs (e.g., needle-in-haystack setups) are the crucial aspect of high-impact LLM applications. While previous studies have flagged issues like positional bias and distracting documents, they've missed a crucial element: the size of the gold/relevant context.

May 28, 2025 at 1:16 AM

Daniel Khashabi

@danielkhashabi.bsky.social

There have been various efforts on disentangling "task learning" vs "task recall" in LLMs. We've recently explored a fresh angle by borrowing from cryptography: with substitution ciphers, we transform a given task into an equivalent, but cryptic (no pun intended!!) forms.

May 22, 2025 at 9:30 PM

Daniel Khashabi

@danielkhashabi.bsky.social

What is a university without "freedom of speech"?

Apparently, ChatGPT has a better grasp than @nyuniversity.

x.com/nebedaay/st...

May 16, 2025 at 7:54 PM

Daniel Khashabi

@danielkhashabi.bsky.social

**Certified Mitigation of Worst-Case LLM Copyright Infringement**

TL;DR: We propose BloomScrub a framework to certifiably remove long verbatim quotes to reduce the risk of copyright violations.

May 12, 2025 at 8:52 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Can LLMs can be co-pilots for peer review?

Answering this requires evaluating *evaluate* whether LLMs can provide critiques that are *grounded* in the context of science papers.

See @JiefuOu's dataset which has a collection of paper claims and their critiques: arxiv.org/pdf/2503.21717

April 30, 2025 at 4:00 PM

Daniel Khashabi

@danielkhashabi.bsky.social

📣📣📣 Tianjian @tli104 and I have refreshed our course material!

self-supervised.cs.jhu.edu/sp2025/

These resources may be helpful if you're:
(1) looking for slides to teach about LLMs, or
(2) interested in diving deeper into the field.

CSCI 601.771: Self-supervised Models

Discussing latest breakthroughs in self-supervised language models

self-supervised.cs.jhu.edu

April 29, 2025 at 8:30 PM

Reposted by Daniel Khashabi

Yining Lu

@yininglu.bsky.social

I will be at #NAACL2025 to present our LLM creativity benchmark. Drop by if interested (Poster Session 8, Fri, May 2)!

I'd love to chat about RL and its interpretability, data influence for post-training, CogSci for LLM. Feel free to reach out and let's have some coffee together ☕ !

Daniel Khashabi @danielkhashabi.bsky.social · Apr 28

"Benchmarking Language Model Creativity: A Case Study on Code Generation" arxiv.org/abs/2407.09007

TLDR— Proposed a framework for benchmarking LLMs' 𝒄𝒓𝒆𝒂𝒕𝒊𝒗𝒊𝒕𝒚.
x.com/Yining__Lu/...

Benchmarking Language Model Creativity: A Case Study on Code Generation

As LLMs become increasingly prevalent, it is interesting to consider how ``creative'' these models can be. From cognitive science, creativity consists of at least two key characteristics:...

arxiv.org

April 28, 2025 at 7:53 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Highlighting our #NAACL2025 papers 🧵🧵🧵

April 28, 2025 at 12:30 PM

Daniel Khashabi

@danielkhashabi.bsky.social

People rely on search engines/chatbots to access science.

But what if you want a bird’s-eye view of science, or to identify over- and under-explored areas?

We introduce 🔺Science Hierarchography🔺, the goal of organizing science papers into conceptual hierarchies.

arxiv.org/abs/2504.13834

April 23, 2025 at 12:30 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Highlighting our #ICLR2025 papers 🧵🧵🧵

(1) "GenEx: Generating an Explorable World"
openreview.net/pdf?id=8NlU...

TLDR— Physical exploration can be expensive, and even impossible. Our proposed policy mitigates this by enabling agents to form an imaginative model of the 3D world.

April 21, 2025 at 12:35 PM

Daniel Khashabi

@danielkhashabi.bsky.social

The flow of talent across institutions keeps research vibrant.
Excited that students from my lab are off to top PhD programs!

Muhan Gao @muhan_gao→ Texas A&M
Zhouxiang Feng @FocusV857→ Rice
Abe Hou @abe_hou→ Stanford
Taiming Lu @TaiMingLu→ Princeton
Dongwei Jiang @Dongwei__Jiang→ USC

April 16, 2025 at 10:05 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Research is important, but so is recharging!
Took the team out for some badminton fun today—amazing energy, lots of laughs, and a reminder of how lucky I am to work with this crew!

April 14, 2025 at 3:09 AM

Daniel Khashabi

@danielkhashabi.bsky.social

Several collaborators expressed frustration, feeling that reviewers are excessively harsh. Might this behavior be influenced by the anonymity of the review process—similar to the dynamic we often observe on platforms such as Reddit?

April 4, 2025 at 3:06 AM

Daniel Khashabi

@danielkhashabi.bsky.social

Can a simulated society of AI agents be used to assess the effectiveness of social policies?

See Abe Hou @abe_hou 's study in the context of "vaccine hesitancy" where we can use historical data for comparison and validation.

arxiv.org/abs/2503.09639

Can A Society of Generative Agents Simulate Human Behavior and...

Can we simulate a sandbox society with generative agents to model human behavior, thereby reducing the over-reliance on real human trials for assessing public policies? In this work, we...

arxiv.org

April 3, 2025 at 8:51 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Hello world!

April 3, 2025 at 1:24 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news