Mark Ibrahim
markibrahim.bsky.social
Mark Ibrahim
@markibrahim.bsky.social
Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/
✅ 22k multi-scene questions
✅ New scenes not in existing web data
✅ Runs in ~15 min on one GPU

Work led by Candace Ross in collaboration with @afeinstein20.bsky.social , Florian Bordes, and @polkirichenko.bsky.social

Check it out on HuggingFace, ArXiv & NeurIPS! huggingface.co/datasets/fac...
facebook/Common-O · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 7, 2025 at 8:55 PM
Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%.

🧵2/3
November 7, 2025 at 8:55 PM
We explain how good delimiters steer attention heads to key input tokens and offer practical recommendations for prompts and delimiter choices to get the best performance from your LLM—tldr; use “!” or “\n”.
October 9, 2025 at 2:32 PM
- MMLU performance can vary by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma).
- Closed models, GPT-4o, are also brittle to the choice of delimiter.

🧵
October 9, 2025 at 2:32 PM
We also find better models are not necessarily better at abstention, suggesting the skill of abstention is an open-research question.

w/ @polkirichenko.bsky.social Sam Bell Kamalika Chaudhuri

Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...

bsky.app/profile/polk...

🧵2/2
June 17, 2025 at 6:32 PM
We found MLM-U training can even outperform transformers trained with additional supervision from A* search traces, showing the promise of alternative learning objectives.

Learn more on our site and code at facebookresearch.github.io/maze_navigat...
MLM-U
facebookresearch.github.io
December 11, 2024 at 6:42 PM
Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).
December 11, 2024 at 6:42 PM
We find MLM-U training improves knowledge retrieval on Wikipedia-based questions and even outperforms a pretrained 7B Mistral model with a much smaller 100M parameter transformer trained from scratch!

Come by our NeurIPS poster Exhibit Halls A-C #3204 11am PST Thursday to learn more.
December 11, 2024 at 6:36 PM
We show training with a factorization agnostic objective, MLM-U (a variable ratio BERT-style loss with links to discrete diffusion), that predicts multiple tokens ahead and back can significantly mitigate the reversal curse!
December 11, 2024 at 6:36 PM
Problem: Language models struggle with the “reversal curse:” an inability to answer reformulations of a question. We show this stems from the standard next token learning objective in what we call “the factorization curse.”
December 11, 2024 at 6:36 PM