Pierre Chambon
@pierrechambon.bsky.social
PhD at FAIR (Meta) and INRIA
Former researcher at Stanford University
Former researcher at Stanford University
Llama 4 results out on ✨BigO(Bench)✨!
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
April 16, 2025 at 3:05 PM
Llama 4 results out on ✨BigO(Bench)✨!
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
✨BigO(Bench)✨ Leaderboard Update!
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
April 10, 2025 at 4:11 PM
✨BigO(Bench)✨ Leaderboard Update!
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
🔥Very happy to introduce BigO(Bench) dataset on @hf.co 🤗
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
April 3, 2025 at 2:46 PM
🔥Very happy to introduce BigO(Bench) dataset on @hf.co 🤗
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
New leaderboard for ✨BigO(Bench)✨!
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
GitHub - facebookresearch/BigOBench: BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.
BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code. - facebookresearch/BigOBench
github.com
March 27, 2025 at 3:24 PM
New leaderboard for ✨BigO(Bench)✨!
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
Does your LLM truly comprehend the complexity of the code it generates? 🥰
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇
March 20, 2025 at 4:48 PM
Does your LLM truly comprehend the complexity of the code it generates? 🥰
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇