Lightnews — Scholar-powered news

Jack Hessel

@jmhessel.bsky.social

3K followers 220 following 28 posts

jmhessel.com

@Anthropic. Seattle bike lane enjoyer. Opinions my own.

Posts Replies Media Videos

Jack Hessel

@jmhessel.bsky.social

life update: a few weeks ago, I made the difficult decision to move on from Samaya AI. Thank you to my collaborators for an exciting 2 years!! ❤️ Starting next month, I'll be joining Anthropic. Excited for a new adventure! 🦾

(I'm still based in Seattle 🏔️🌲🏕️; but in SF regularly)

A photograph of mount rainier I took from reflection lake.

August 20, 2025 at 12:43 AM

Jack Hessel

@jmhessel.bsky.social

Meanwhile in my neighborhood in Seattle we've been fighting 5 years for (1) bus lane and 30 years for a (1) mile bike path

A picture of a transit sign with 4 minute frequencies

December 14, 2024 at 6:38 AM

Jack Hessel

@jmhessel.bsky.social

Awesome work from Jacob et al. (+ collaborators who I could find on bluesky: @mrdrozdov.com @matei-zaharia.bsky.social @mcarbin.bsky.social @lateinteraction.bsky.social ; apologies if I missed anyone!)

Screenshot of the paper's title/author list.

Drowning in Documents: Consequences of Scaling Reranker Inference

Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin,
Omar Khattab, and Andrew Drozdov

from

Databricks and University of Illinois Urbana-Champaign

November 27, 2024 at 9:59 PM

Jack Hessel

@jmhessel.bsky.social

This can likely be explained by data sampling bias. Re-ranking training sets are often constructed by running top-K vector search (e.g., w/ BM25)

The training data thus contains high (doc, query) word similarity cases, but not obviously irrelevant docs, or relevant docs not found by vector search.

Snippet from the discussion section of the paper. The text reads:

Negatives in Training
We conjecture that one cause of today’s point-wise rerankers lack of robustness may be exposure bias with respect to how rerankers are exposed to a limited set of negatives during training.

November 27, 2024 at 9:59 PM

Jack Hessel

@jmhessel.bsky.social

Information retrieval systems usually operate as a model "cascade" -- fast vector search over billions of documents followed by a more expressive LLM "re-ranking" the resulting top-K.

But beware 👻 !

Despite expressivity, top-K re-rankers generalize poorly as K increases.

arxiv.org/pdf/2411.11767

Figure 1 from the linked paper, which illustrates the performance of a re-ranker dropping as the number of re-ranked documents increases.

November 27, 2024 at 9:59 PM

Jack Hessel

@jmhessel.bsky.social

LLMs generate novel word sequences not contained in their pretraining data. However, compared to humans, models generate significantly fewer novel n-grams.

RLHF = 30% *more* copying than base!

Awesome work from the awesome Ximing Lu (gloriaximinglu.github.io) et al. 🤩

arxiv.org/pdf/2410.04265

A screenshot from the linked paper's figure 1. The figure is a pretty-complicated three column figure, but --- in essence, it sketches out how the authors compare llm sequences to the pretraining data / human authors to the pretraining data. Humans write more novel n-gram sequences.

November 22, 2024 at 6:14 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news