Lightnews — Scholar-powered news

Manuel Gomez Rodriguez

@autreche.bsky.social

90 followers 74 following 9 posts

Human-Centric Machine Learning at the Max Planck Institute for Software Systems

Posts Replies Media Videos

Reposted by Manuel Gomez Rodriguez

Stratis Tsirtsis

@stratiss.bsky.social

Heading to Rio de Janeiro 🇧🇷 for UAI 2025 (@auai.org) to present our tutorial with @tobigerstenberg.bsky.social and @autreche.bsky.social on "Counterfactuals in Minds and Machines" on Monday. Looking forward to this! If you are in Rio, let's meet!

July 18, 2025 at 12:38 PM

Manuel Gomez Rodriguez

@autreche.bsky.social

Check out an implementation of our model on several LLMs from the Llama family at github.com/Networks-Lea....

This has been a joint effort with multiple members of my group: Nina Corvelo Benz, Stratis Tsirtsis, Eleni Straitouri, Ivi Chatzi, Ander Artola Velasco & Suhas Thejaswi.

GitHub - Networks-Learning/coupled-llm-evaluation: Code for "Evaluation of Large Language Models via Coupled Token Generation", Arxiv 2025.

Code for "Evaluation of Large Language Models via Coupled Token Generation", Arxiv 2025. - Networks-Learning/coupled-llm-evaluation

github.com

February 5, 2025 at 8:33 AM

Manuel Gomez Rodriguez

@autreche.bsky.social

This suggests that the apparent advantage of a LLM over others in existing evaluation protocols may not be genuine but rather confounded by the randomness inherent to the generation process. Our model is easy to implement and does not require any finetuning/prompt engineering 5/

February 5, 2025 at 8:33 AM

Manuel Gomez Rodriguez

@autreche.bsky.social

On evaluations based on (human) pairwise comparisons, we show that coupled and standard autoregressive generation can surprisingly lead to different rankings when comparing more than two LLMs, even with an infinite amount of samples 4/

February 5, 2025 at 8:33 AM

Manuel Gomez Rodriguez

@autreche.bsky.social

On evaluations on benchmark datasets, we show that coupled autoregressive generation leads to the same conclusions as standard autoregressive generation but using provably fewer samples. For example, on MMLU, coupled autoregressive generation requires up to 40% fewer samples 3/

February 5, 2025 at 8:33 AM

Manuel Gomez Rodriguez

@autreche.bsky.social

Our key idea is to couple the autoregressive processes of a set of LLMs under comparison, particularly their samplers, by means of sharing the same source of randomness. Loosely speaking, coupled autoregressive generation ensures that no LLM will have better luck than others 2/

February 5, 2025 at 8:33 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news