Lightnews — Scholar-powered news

Archiki Prasad

@archiki.bsky.social

620 followers 840 following 24 posts

Ph.D. Student at UNC NLP | Apple Scholar in AI/ML Ph.D. Fellowship | Prev: FAIR at Meta, AI2, Adobe (Intern) | Interests: #NLP, #ML | https://archiki.github.io/

Posts Replies Media Videos

Archiki Prasad

@archiki.bsky.social

Thanks to my coauthors: @hwang98.bsky.social @esteng.bsky.social @mohitbansal.bsky.social
for the fun collaboration!

@unccs.bsky.social

Paper: arxiv.org/abs/2504.13079
Data and Code: github.com/HanNight/RAM...

Retrieval-Augmented Generation with Conflicting Evidence

Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. However, in practice, these systems often need to handle...

arxiv.org

April 18, 2025 at 5:10 PM

Archiki Prasad

@archiki.bsky.social

Can RAG systems handle imbalanced evidence or increasing misinformation?

➡️ As document support becomes imbalanced, baselines ignore under-supported correct answers but MADAM-RAG maintains stable performance

➡️ As misinformation 📈, baselines degrade sharply (−46%) but MADAM-RAG remains more robust

April 18, 2025 at 5:06 PM

Archiki Prasad

@archiki.bsky.social

How important are multi-round debate and aggregation in MADAM-RAG?

Increasing debate rounds in MADAM-RAG improves performance by allowing agents to refine answers via debate.

Aggregator provides even greater gains, especially in early rounds, aligning conflicting views & suppressing misinfo.

April 18, 2025 at 5:06 PM

Archiki Prasad

@archiki.bsky.social

We evaluate on 3 datasets: FaithEval (suppression of misinformation), AmbigDocs (disambiguation across sources), RAMDocs (our dataset w/ different types of conflict).

MADAM-RAG consistently outperforms concatenated-prompt and Astute RAG baselines across all three datasets and model backbones.

April 18, 2025 at 5:06 PM

Archiki Prasad

@archiki.bsky.social

We propose MADAM-RAG, a structured, multi-agent framework designed to handle inter-doc conflicts, misinformation, & noise in retrieved content, comprising:

1️⃣ Independent LLM agents - generate intermediate response conditioned on a single doc
2️⃣ Centralized aggregator
3️⃣ Iterative multi-round debate

April 18, 2025 at 5:06 PM

Archiki Prasad

@archiki.bsky.social

📂RAMDocs is designed to reflect the complexities of real-world retrieval. It includes:

➡️ Ambiguous queries w/ multiple valid ans.
➡️ Imbalanced document support (some answers backed by many sources, others by fewer)
➡️ Docs w/ misinformation (plausible but wrong claims) or noisy/irrelevant content

April 18, 2025 at 5:06 PM

Archiki Prasad

@archiki.bsky.social

Thanks Elias!!

March 27, 2025 at 8:24 PM

Archiki Prasad

@archiki.bsky.social

Thanks Jaemin, learned so much from you as well!

March 27, 2025 at 8:24 PM

Archiki Prasad

@archiki.bsky.social

Thanks to my amazing co-authors @esteng.bsky.social (co-lead), @cyjustinchen.bsky.social, @codezakh.bsky.social, @mohitbansal.bsky.social

@unccs.bsky.social

Paper: arxiv.org/abs/2502.01619
Code+Datasets: github.com/archiki/UTGe...

Learning to Generate Unit Tests for Automated Debugging

Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to a large language model (LLM) as it iteratively debugs faulty code, motivating automated test g...

arxiv.org

February 4, 2025 at 7:10 PM

Archiki Prasad

@archiki.bsky.social

Lastly, we show that both test-time scaling and backtracking are crucial for UTDebug, and scaling the number of generated UTs also consistently improves code accuracy.

February 4, 2025 at 7:10 PM

Archiki Prasad

@archiki.bsky.social

Combining UTGen with UTDebug 🤝 we consistently outperform no UT feedback, randomly sampling UTs, and prompting targeted UTs across 3 models & datasets.

For partially correct code with subtle errors (our MBPP+Fix hard split) debugging with UTGen improves over baselines by >12.35% on Qwen 2.5!

February 4, 2025 at 7:10 PM

Archiki Prasad

@archiki.bsky.social

RQ3: We also propose ✨UTDebug ✨ with two key modifications:

1⃣Test-time scaling (self-consistency over multiple samples) for increasing output acc.

2⃣Validation & Backtracking: Generating multiple UTs to perform validation, accept edits only when the overall pass rate increases & backtrack otherwise

February 4, 2025 at 7:10 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news