Lightnews — Scholar-powered news

Pierre Chambon

@pierrechambon.bsky.social

24 followers 66 following 27 posts

PhD at FAIR (Meta) and INRIA
Former researcher at Stanford University

Posts Replies Media Videos

Pierre Chambon

@pierrechambon.bsky.social

Llama 4 results out on ✨BigO(Bench)✨!

Llama 4 Maverick is top 4 all@1 on Time Complexity Generation and top 2🥈coeffFull on Time Complexity Ranking (beating R1, though not using any reasoning tokens).

The model is less performant on Space Complexity.

👇All links below👇

April 16, 2025 at 3:05 PM

Pierre Chambon

@pierrechambon.bsky.social

✨BigO(Bench)✨ Leaderboard Update!

3 models added to our benchmark:
🏆 nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
🧑‍💻 agentica-org/DeepCoder-14B-Preview
🤲 all-hands/openhands-lm-32b-v0.1

Thanks @vllm_project and @huggingface for quickly supporting inference!

👇All links below👇

April 10, 2025 at 4:11 PM

Pierre Chambon

@pierrechambon.bsky.social

🔥Very happy to introduce BigO(Bench) dataset on @hf.co 🤗

✨3,105 coding problems and 1,190,250 solutions from CodeContests

✨Time/Space Complexity labels and curve coefficients

✨Up to 5k Runtime/Memory Footprint measures for each solution

huggingface.co/datasets/fac...

April 3, 2025 at 2:46 PM

Pierre Chambon

@pierrechambon.bsky.social

New leaderboard for ✨BigO(Bench)✨!

🥇Qwen QwQ new SOTA on Complexity Generation/Ranking
🥈DeepseekV3-0324 on par with reasoning models!
🥉Gemma3 strong on Complexity Prediction

💻Github: github.com/facebookresearch/bigobench
🏆Leaderboard: facebookresearch.github.io/BigOBench/leaderboard.html

🧵1/6

GitHub - facebookresearch/BigOBench: BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.

BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code. - facebookresearch/BigOBench

github.com

March 27, 2025 at 3:24 PM

Pierre Chambon

@pierrechambon.bsky.social

Does your LLM truly comprehend the complexity of the code it generates? 🥰

Introducing our new non-saturated (for at least the coming week? 😉) benchmark:

✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?

Check out the details below !👇

March 20, 2025 at 4:48 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news