Lightnews — Scholar-powered news

Michael Günther

@michael-g-u.bsky.social

53 followers 150 following 21 posts

ML @jina-ai.bsky.social
https://github.com/guenthermi

Posts Replies Media Videos

Michael Günther

@michael-g-u.bsky.social

Interesting blog post by my colleague @scottmartens.bsky.social on influences of text size on embedding similarity, e.g., longer queries produce higher scores, thus comparing score between two docs and the same query works but scores for different queries are not comparable
jina.ai/news/on-the-...

April 23, 2025 at 11:12 AM

Michael Günther

@michael-g-u.bsky.social

Whether to use late chunking also depends on the chunk size, for smaller chunks late chunking is generally more useful than for large chunk sizes.

December 5, 2024 at 8:49 AM

Michael Günther

@michael-g-u.bsky.social

Chunking improves the performance for fact retrieval task but can actually harm the performance for other retrieval tasks. Late chunking is useful for coherent datasets and often a good compromise to help embeddings to retain context information but also to focus on details:

December 5, 2024 at 8:49 AM

Michael Günther

@michael-g-u.bsky.social

First, more input helps, but not for all retrieval tasks equally:

December 5, 2024 at 8:49 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news