Lightnews — Scholar-powered news

Models often fail to:
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]

April 23, 2025 at 5:00 PM

Anirudh Khatry

@anirudhkhatry.bsky.social

We evaluate state-of-the-art closed-source LLMs (like o1, Claude-3.7, and Gemini-1.5-Pro), open-source models like QwQ-32B and virtuoso-32B, and the SWE-Agent on CRUST-Bench.
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]

April 23, 2025 at 5:00 PM

Anirudh Khatry

@anirudhkhatry.bsky.social

Our benchmark is the first to provide:
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]

April 23, 2025 at 5:00 PM

Anirudh Khatry

@anirudhkhatry.bsky.social

Transpiling C to Rust helps modernize legacy code with memory safety guarantees. CRUST-Bench evaluates whether transpilation methods yield safe, idiomatic Rust, using handcrafted interfaces and tests to ensure safety and validate correctness.
🧵[2/6]

April 23, 2025 at 5:00 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news