Anirudh Khatry
@anirudhkhatry.bsky.social
CS PhD @utaustin.bsky.social
Congratulations Kanishka!
June 2, 2025 at 3:24 PM
Congratulations Kanishka!
📄 Read the full paper:
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.
🧵[6/6]
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.
🧵[6/6]
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether ...
arxiv.org
April 23, 2025 at 5:00 PM
📄 Read the full paper:
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.
🧵[6/6]
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.
🧵[6/6]
Models often fail to:
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]
April 23, 2025 at 5:00 PM
Models often fail to:
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]
We evaluate state-of-the-art closed-source LLMs (like o1, Claude-3.7, and Gemini-1.5-Pro), open-source models like QwQ-32B and virtuoso-32B, and the SWE-Agent on CRUST-Bench.
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]
April 23, 2025 at 5:00 PM
We evaluate state-of-the-art closed-source LLMs (like o1, Claude-3.7, and Gemini-1.5-Pro), open-source models like QwQ-32B and virtuoso-32B, and the SWE-Agent on CRUST-Bench.
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]
Our benchmark is the first to provide:
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]
April 23, 2025 at 5:00 PM
Our benchmark is the first to provide:
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]
Transpiling C to Rust helps modernize legacy code with memory safety guarantees. CRUST-Bench evaluates whether transpilation methods yield safe, idiomatic Rust, using handcrafted interfaces and tests to ensure safety and validate correctness.
🧵[2/6]
🧵[2/6]
April 23, 2025 at 5:00 PM
Transpiling C to Rust helps modernize legacy code with memory safety guarantees. CRUST-Bench evaluates whether transpilation methods yield safe, idiomatic Rust, using handcrafted interfaces and tests to ensure safety and validate correctness.
🧵[2/6]
🧵[2/6]