Lightnews — Scholar-powered news

Daniel Khashabi

@danielkhashabi.bsky.social

We're extremely thankful to the Evo2 team ( @BrianHie @pdhsu @garykbrixi @mgdurrant @MichaelPoli6 etc.). Not only these models help advance biomed research, now we see that they can help AI community better understand the fundamentals of pre-training.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Draft: huggingface.co/papers/2511...

Huge thanks to @N8Programs for leading the work, and to collaborators @anqi_liu33 @aamixsh @mrevsine @mike_schatz.

Paper page - Genomic Next-Token Predictors are In-Context Learners

huggingface.co

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

𝗗𝗼𝗲𝘀 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻 𝗵𝘂𝗺𝗮𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗶𝘀 𝗶𝗿𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝘁? No! But it suggests there may be universal distributional properties across different languages (human, DNA, etc.) that yield ICL. It remains an open question what these properties are.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

𝗗𝗼𝗲𝘀 𝗜𝗖𝗟 𝗶𝗻 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗰𝘁 𝗶𝗱𝗲𝗻𝘁𝗶𝗰𝗮𝗹𝗹𝘆? No! While share macro-level ICL trends, each shows domain-specific inductive biases traceable to properties of DNA vs human language.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: To our knowledge, this is the first evidence of emergent ICL in non-[human]language symbolic sequences. It suggests that ICL is modality-agnostic, and a general consequence of large-scale autoregressive training on rich data distributions.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

This lets us compare Evo2 (genomic) vs Qwen3 (language) under matched few-shot prompts.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

𝗛𝗼𝘄 𝗱𝗶𝗱 𝘄𝗲 𝗰𝗼𝗺𝗽𝗮𝗿𝗲 𝗴𝗲𝗻𝗼𝗺𝗶𝗰 𝘃𝘀 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀? We built a suite of symbolic bitstring-reasoning tasks and encoded them two ways: (1) genomic alphabet (A/T/C/G) and (2) linguistic alphabet (digits).

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

→ similar log-linear gains with more shots
→ similar improvement with model scale
... all learned purely from DNA (nucleotide) sequences.

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Thrilled to share our latest result: 𝗚𝗲𝗻𝗼𝗺𝗶𝗰🧬 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝙤𝙣𝙡𝙮 𝗼𝗻 '𝗻𝗲𝘅𝘁-𝗻𝘂𝗰𝗹𝗲𝗼𝘁𝗶𝗱𝗲 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻' 𝗲𝘅𝗵𝗶𝗯𝗶𝘁 𝗜𝗖𝗟!

What's remarkable is that their overall pattern closely mirrors LLMs:
→ similar few-shot pattern induction

November 18, 2025 at 5:27 PM

Daniel Khashabi

@danielkhashabi.bsky.social

𝗦𝗲𝗲 𝘁𝗵𝗲 𝗱𝗲𝘁𝗮𝗶𝗹𝘀 𝗼𝗳 𝘁𝗵𝗲 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀: huggingface.co/papers/2509...

Work lead by @aamixsh and in collaboration with @anqi_liu33.
@HopkinsEngineer @JHUCompSci

x.com/aamixsh/sta...

Paper page - IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning

huggingface.co

October 3, 2025 at 2:23 PM

Daniel Khashabi

@danielkhashabi.bsky.social

For 2️⃣, we introduce 𝑨𝒄𝒕𝒊𝒗𝒂𝒕𝒊𝒐𝒏 𝑨𝒍𝒊𝒈𝒏𝒎𝒆𝒏𝒕 (𝑰𝑨𝟐) -- a method that 𝘥𝘪𝘴𝘵𝘪𝘭𝘭𝘴 𝘐𝘊𝘓 𝘢𝘤𝘵𝘪𝘷𝘢𝘵𝘪𝘰𝘯𝘴 𝘪𝘯𝘵𝘰 𝘵𝘩𝘦 𝘱𝘢𝘳𝘢𝘮𝘦𝘵𝘦𝘳𝘴 𝘰𝘧 𝘢 𝘱𝘳𝘦-𝘵𝘳𝘢𝘪𝘯𝘦𝘥 𝘮𝘰𝘥𝘦𝘭. Then, running SFT on top of this "primed" model leads to consistent gains over vanilla SFT.

October 3, 2025 at 2:23 PM

Daniel Khashabi

@danielkhashabi.bsky.social

On 1️⃣, building on prior findings, we find that ICL and SFT trigger distinct ⚡activation⚡ patterns -- an additional signal that ICL and SFT operate differently. We also find that ICL is generally more calibrated than SFT, though sometimes at the cost of accuracy.

October 3, 2025 at 2:23 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Our latest work asks two questions:
1️⃣ Do ICL and SFT operate differently?
2️⃣ And if so, can one 𝗹𝗲𝘃𝗲𝗿𝗮𝗴𝗲 𝘁𝗵𝗲𝗶𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝗿𝗶𝘁𝘆 𝗳𝗼𝗿 𝗯𝗲𝘁𝘁𝗲𝗿 𝗮𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻?

October 3, 2025 at 2:23 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Paper: arxiv.org/abs/2508.11027 (to appear in @COLM_conf)
Code: github.com/JHU-CLSP/he...

With @andrewwnlp (lead), Sophia Hager, Adi Asija, Nicholas Andrews @HopkinsEngineer @JohnsHopkins

GitHub - JHU-CLSP/hell-or-high-water: Code and data for the paper: "Hell or High Water: Evaluating Agentic Recovery from External Failures"

Code and data for the paper: "Hell or High Water: Evaluating Agentic Recovery from External Failures" - JHU-CLSP/hell-or-high-water

github.com

September 19, 2025 at 2:29 PM

Daniel Khashabi

@danielkhashabi.bsky.social

👉 The overall takeaway: LLM agents today are brittle in open-world environments. For real-world deployment, we need robust strategies for fallback planning and recovery.

September 19, 2025 at 2:29 PM

Daniel Khashabi

@danielkhashabi.bsky.social

(3) More tools = harder recovery. As the toolset grows, fallback planning becomes less reliable, not more.

September 19, 2025 at 2:29 PM

Daniel Khashabi

@danielkhashabi.bsky.social

(1) LLM agents struggle to recover. Even frontier models show large performance drops when tools fail.

(2) RAG on tool schemas doesn’t solve it. Across models, we observe a significant accuracy gap between adversarial and non-adversarial settings.

September 19, 2025 at 2:29 PM

Daniel Khashabi

@danielkhashabi.bsky.social

Tool failures happen in practice: APIs break, schemas change, endpoints go offline. The key question we ask is: how does your LLM-based agent recover by exploring alternative solutions?

From our analysis in our controlled environment, we find:

September 19, 2025 at 2:29 PM