Former researcher at Stanford University
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).
The model is less performant on Space Complexity.
👇All links below👇
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1
Thanks @vllm_project and @huggingface for quickly supporting inference!
👇All links below👇
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
✨3,105 coding problems and 1,190,250 solutions from CodeContests
✨Time/Space Complexity labels and curve coefficients
✨Up to 5k Runtime/Memory Footprint measures for each solution
huggingface.co/datasets/fac...
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction
💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html
🧵1/6
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
Check out the details below !👇