Saahil Ognawala
saahilognawala.bsky.social
Saahil Ognawala
@saahilognawala.bsky.social
Head of Product @jina-ai.bsky.social AI, software, security, product management
These results align with NoLIMA's findings on LLMs: context length fundamentally limits effective retrieval—more than semantic formulation/rephrasing. Even with direct lexical matching, embedding models struggle beyond 4K tokens.
jina.ai/news/long-co...
Long-Context Embedding Models are Blind Beyond 4K Tokens
We investigate embedding models on new "needle-in-haystack" tasks and find that beyond 4K tokens, they're just rolling dice - even with exact lexical matches or query expansion, they can't tell signal...
jina.ai
March 7, 2025 at 9:31 AM
Results tl;dr:

1. Performance drops >70% from 128 to 8K tokens
2. Query expansion helps marginally, but can't solve the core issue
3. When literal matches fail in long context, semantic alternatives fail harder
4. Position bias: needles at start/end perform better than middle
March 7, 2025 at 9:29 AM
An obvious pivot would be to sell unique data-assets as IP, but a far more future-proof proposition is to sell data flywheels - the more enterprises you onboard, the more differentiated and indispensable your software gets.
February 16, 2025 at 8:31 AM
The business value that still remains to be cracked was, and is, in inference not training! Even if we accept the take that skipping sft direcrlt for RL meant huge savings in training costs - you're only ever going to train it once! Where you gonna get 30 h100s for running it?
January 27, 2025 at 8:07 PM
A lot of people prefer getting into the weeds of what agents are, instead of this very important point about introspection at the heart of it - I appreciate 🤘🏽
December 14, 2024 at 4:58 PM
There's no free lunch though - for slightly complex vulnerability audits or refactoring Claude works best with multiple subtask-oriented chats. O1-p, while rarely complaining about context bloat, just literally stops even trying to be helpful after a point (which you just have to know from vibes).
December 12, 2024 at 6:43 AM
So, it seems that, unlike text tasks where pre-trained MLM models do form a generally good backbone for downstream tasks (I'm now not even sure of this?), when it comes to images, one needs to pay careful attention for whether the downstream task is a multimodal or a unimodal image one.
December 3, 2024 at 2:16 PM
I didn't find any studies pitting both models head-to-head for vision-language, or even just vision tasks, but there is this one study that shows that for small scale datasets, MAE-trained vision encoders at least do improve CLIP model performance arxiv.org/abs/2301.07836
December 3, 2024 at 2:15 PM
...except language alignment, where the distillation method of EVA-02 performs arguably better on multimodal tasks, due to its direct optimization with NL modelling, which gives it an edge over the technique of image reconstruction only.
December 3, 2024 at 2:14 PM
Even with as much as 75% of the image patches masked, this MAE technique performs exceedingly well for almost all downstream vision tasks, incl. classification, semantic segmentation etc. ALMOST all of them ...
December 3, 2024 at 2:14 PM
@bowang0911.bsky.social showed me this cool paper that I'd never read before, about Masked Autoencoder (MAE) for images. The idea: an image encoder encodes non-masked patches, followed by a decoder using the non-masked and masked embeddings to regenerate the original image arxiv.org/abs/2111.06377
Masked Autoencoders Are Scalable Vision Learners
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the mis...
arxiv.org
December 3, 2024 at 2:13 PM
Reposted by Saahil Ognawala
And the idea that "is it public" is all that matters and not what you DO with people's content, is absurd.

Like you cannot possibly suggest with a straight face that this example of using transgender YouTubers' videos to train facial recognition is 100% fine.

www.theverge.com/2017/8/22/16...
Transgender YouTubers had their videos grabbed to train facial recognition software
In the race to train AI, researchers are taking data first and asking questions later
www.theverge.com
November 27, 2024 at 3:56 PM
A bunch of things that I'd add to it are
1. VLMs for maintaining document structure integrity, while getting equally if not better outcomes.
2. Colbert models for improving explanaibility of results, before diving into fine-tuning.
3. Fine-tuning reranker models
November 28, 2024 at 1:23 PM