Michael Günther
michael-g-u.bsky.social
Michael Günther
@michael-g-u.bsky.social
ML @jina-ai.bsky.social
https://github.com/guenthermi
Interesting blog post by my colleague @scottmartens.bsky.social on influences of text size on embedding similarity, e.g., longer queries produce higher scores, thus comparing score between two docs and the same query works but scores for different queries are not comparable
jina.ai/news/on-the-...
April 23, 2025 at 11:12 AM
Whether to use late chunking also depends on the chunk size, for smaller chunks late chunking is generally more useful than for large chunk sizes.
December 5, 2024 at 8:49 AM
Chunking improves the performance for fact retrieval task but can actually harm the performance for other retrieval tasks. Late chunking is useful for coherent datasets and often a good compromise to help embeddings to retain context information but also to focus on details:
December 5, 2024 at 8:49 AM
First, more input helps, but not for all retrieval tasks equally:
December 5, 2024 at 8:49 AM