Lightnews — Scholar-powered news

Mark Ibrahim

@markibrahim.bsky.social

61 followers 120 following 17 posts

Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/

Posts Replies Media Videos

Mark Ibrahim

@markibrahim.bsky.social

✅ 22k multi-scene questions
✅ New scenes not in existing web data
✅ Runs in ~15 min on one GPU

Work led by Candace Ross in collaboration with @afeinstein20.bsky.social , Florian Bordes, and @polkirichenko.bsky.social

Check it out on HuggingFace, ArXiv & NeurIPS! huggingface.co/datasets/fac...

facebook/Common-O · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 7, 2025 at 8:55 PM

Mark Ibrahim

@markibrahim.bsky.social

Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%.

🧵2/3

November 7, 2025 at 8:55 PM

Mark Ibrahim

@markibrahim.bsky.social

We explain how good delimiters steer attention heads to key input tokens and offer practical recommendations for prompts and delimiter choices to get the best performance from your LLM—tldr; use “!” or “\n”.

October 9, 2025 at 2:32 PM

Mark Ibrahim

@markibrahim.bsky.social

- MMLU performance can vary by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma).
- Closed models, GPT-4o, are also brittle to the choice of delimiter.

🧵

October 9, 2025 at 2:32 PM

Mark Ibrahim

@markibrahim.bsky.social

We also find better models are not necessarily better at abstention, suggesting the skill of abstention is an open-research question.

w/ @polkirichenko.bsky.social Sam Bell Kamalika Chaudhuri

Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...

bsky.app/profile/polk...

🧵2/2

June 17, 2025 at 6:32 PM

Mark Ibrahim

@markibrahim.bsky.social

We found MLM-U training can even outperform transformers trained with additional supervision from A* search traces, showing the promise of alternative learning objectives.

Learn more on our site and code at facebookresearch.github.io/maze_navigat...

MLM-U

facebookresearch.github.io

December 11, 2024 at 6:42 PM

Mark Ibrahim

@markibrahim.bsky.social

Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).

December 11, 2024 at 6:42 PM

Mark Ibrahim

@markibrahim.bsky.social

We find MLM-U training improves knowledge retrieval on Wikipedia-based questions and even outperforms a pretrained 7B Mistral model with a much smaller 100M parameter transformer trained from scratch!

Come by our NeurIPS poster Exhibit Halls A-C #3204 11am PST Thursday to learn more.

December 11, 2024 at 6:36 PM

Mark Ibrahim

@markibrahim.bsky.social

We show training with a factorization agnostic objective, MLM-U (a variable ratio BERT-style loss with links to discrete diffusion), that predicts multiple tokens ahead and back can significantly mitigate the reversal curse!

December 11, 2024 at 6:36 PM

Mark Ibrahim

@markibrahim.bsky.social

Problem: Language models struggle with the “reversal curse:” an inability to answer reformulations of a question. We show this stems from the standard next token learning objective in what we call “the factorization curse.”

December 11, 2024 at 6:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news