Pierre Chambon
pierrechambon.bsky.social
Pierre Chambon
@pierrechambon.bsky.social
PhD at FAIR (Meta) and INRIA
Former researcher at Stanford University
Llama 4 results out on ✨BigO(Bench)✨!

Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).

The model is less performant on Space Complexity.

👇All links below👇
April 16, 2025 at 3:05 PM
✨BigO(Bench)✨ Leaderboard Update!

3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑‍💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1

Thanks @vllm_project and @huggingface for quickly supporting inference!

👇All links below👇
April 10, 2025 at 4:11 PM
🔥Very happy to introduce BigO(Bench) dataset on @hf.co 🤗

✨3,105 coding problems and 1,190,250 solutions from CodeContests

✨Time/Space Complexity labels and curve coefficients

✨Up to 5k Runtime/Memory Footprint measures for each solution

huggingface.co/datasets/fac...
April 3, 2025 at 2:46 PM
New leaderboard for ✨BigO(Bench)✨!

🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models! 
🥉Gemma3 strong on Complexity Prediction

💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html

🧵1/6
GitHub - facebookresearch/BigOBench: BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.
BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code. - facebookresearch/BigOBench
github.com
March 27, 2025 at 3:24 PM
Does your LLM truly comprehend the complexity of the code it generates? 🥰

Introducing our new non-saturated (for at least the coming week? 😉) benchmark:

✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?

Check out the details below !👇
March 20, 2025 at 4:48 PM