Lightnews — Scholar-powered news

Saahil Ognawala

@saahilognawala.bsky.social

These results align with NoLIMA's findings on LLMs: context length fundamentally limits effective retrieval—more than semantic formulation/rephrasing. Even with direct lexical matching, embedding models struggle beyond 4K tokens.
jina.ai/news/long-co...

Long-Context Embedding Models are Blind Beyond 4K Tokens

We investigate embedding models on new "needle-in-haystack" tasks and find that beyond 4K tokens, they're just rolling dice - even with exact lexical matches or query expansion, they can't tell signal...

jina.ai

March 7, 2025 at 9:31 AM

Saahil Ognawala

@saahilognawala.bsky.social

Results tl;dr:

1. Performance drops >70% from 128 to 8K tokens
2. Query expansion helps marginally, but can't solve the core issue
3. When literal matches fail in long context, semantic alternatives fail harder
4. Position bias: needles at start/end perform better than middle

March 7, 2025 at 9:29 AM

Saahil Ognawala

@saahilognawala.bsky.social

An obvious pivot would be to sell unique data-assets as IP, but a far more future-proof proposition is to sell data flywheels - the more enterprises you onboard, the more differentiated and indispensable your software gets.

February 16, 2025 at 8:31 AM

Saahil Ognawala

@saahilognawala.bsky.social

The business value that still remains to be cracked was, and is, in inference not training! Even if we accept the take that skipping sft direcrlt for RL meant huge savings in training costs - you're only ever going to train it once! Where you gonna get 30 h100s for running it?

January 27, 2025 at 8:07 PM

Saahil Ognawala

@saahilognawala.bsky.social

A lot of people prefer getting into the weeds of what agents are, instead of this very important point about introspection at the heart of it - I appreciate 🤘🏽

December 14, 2024 at 4:58 PM

Saahil Ognawala

@saahilognawala.bsky.social

There's no free lunch though - for slightly complex vulnerability audits or refactoring Claude works best with multiple subtask-oriented chats. O1-p, while rarely complaining about context bloat, just literally stops even trying to be helpful after a point (which you just have to know from vibes).

December 12, 2024 at 6:43 AM

Saahil Ognawala

@saahilognawala.bsky.social

So, it seems that, unlike text tasks where pre-trained MLM models do form a generally good backbone for downstream tasks (I'm now not even sure of this?), when it comes to images, one needs to pay careful attention for whether the downstream task is a multimodal or a unimodal image one.

December 3, 2024 at 2:16 PM

Saahil Ognawala

@saahilognawala.bsky.social

I didn't find any studies pitting both models head-to-head for vision-language, or even just vision tasks, but there is this one study that shows that for small scale datasets, MAE-trained vision encoders at least do improve CLIP model performance arxiv.org/abs/2301.07836

December 3, 2024 at 2:15 PM

Saahil Ognawala

@saahilognawala.bsky.social

...except language alignment, where the distillation method of EVA-02 performs arguably better on multimodal tasks, due to its direct optimization with NL modelling, which gives it an edge over the technique of image reconstruction only.

December 3, 2024 at 2:14 PM

Saahil Ognawala

@saahilognawala.bsky.social

Even with as much as 75% of the image patches masked, this MAE technique performs exceedingly well for almost all downstream vision tasks, incl. classification, semantic segmentation etc. ALMOST all of them ...

December 3, 2024 at 2:14 PM

Saahil Ognawala

@saahilognawala.bsky.social

@bowang0911.bsky.social showed me this cool paper that I'd never read before, about Masked Autoencoder (MAE) for images. The idea: an image encoder encodes non-masked patches, followed by a decoder using the non-masked and masked embeddings to regenerate the original image arxiv.org/abs/2111.06377

Masked Autoencoders Are Scalable Vision Learners

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the mis...

arxiv.org

December 3, 2024 at 2:13 PM

Reposted by Saahil Ognawala

Dr. Casey Fiesler

@cfiesler.bsky.social

And the idea that "is it public" is all that matters and not what you DO with people's content, is absurd.

Like you cannot possibly suggest with a straight face that this example of using transgender YouTubers' videos to train facial recognition is 100% fine.

www.theverge.com/2017/8/22/16...

Transgender YouTubers had their videos grabbed to train facial recognition software

In the race to train AI, researchers are taking data first and asking questions later

www.theverge.com

November 27, 2024 at 3:56 PM

Saahil Ognawala

@saahilognawala.bsky.social

A bunch of things that I'd add to it are
1. VLMs for maintaining document structure integrity, while getting equally if not better outcomes.
2. Colbert models for improving explanaibility of results, before diving into fine-tuning.
3. Fine-tuning reranker models

November 28, 2024 at 1:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news