Lightnews — Scholar-powered news

Luke Marris

@lukemarris.bsky.social

690 followers 170 following 24 posts

Research Engineer at Google DeepMind.
Interests in game theory, reinforcement learning, and deep learning.

Website: https://www.lukemarris.info/
Google Scholar: https://scholar.google.com/citations?user=dvTeSX4AAAAJ

Posts Replies Media Videos

Reposted by Luke Marris

Marc Lanctot

@sharky6000.bsky.social

Hello everyone 👋 Good news!

🚨 Our Game Theory & Multiagent Systems team at Google DeepMind is hiring! 🚨

.. and we have not one, but two open positions! One Research Scientist role and one Research Engineer role. 😁

Please repost and tell anyone who might be interested!

Details in thread below 👇

September 29, 2025 at 12:36 PM

Luke Marris

@lukemarris.bsky.social

Our team is hiring REs (job-boards.greenhouse.io/deepmind/job...) and RSs (job-boards.greenhouse.io/deepmind/job...). Please apply if you are interested in game theory / multiagent.

Research Engineer, Game Theory & Multi-Agent Systems

London, UK

job-boards.greenhouse.io

September 29, 2025 at 8:29 AM

Reposted by Luke Marris

Siqi Liu (刘思奇)

@liusiqi.bsky.social

Frontier models are often compared on crowdsourced user prompts - user prompts can be low-quality, biased and redundant, making "performance on average" hard to trust.

Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!

Re-evaluating Open-Ended Evaluation of Large Language Models

A case study using the livebench.ai leaderboard.

shorturl.at

April 18, 2025 at 4:34 PM

Luke Marris

@lukemarris.bsky.social

[🧵1/N] Thrilled to share our work "Re-evaluating Open-Ended Evaluation of Large Language Models"! 🚀 Popular LLM leaderboards (think Elo/Chatbot Arena) are useful, but are they telling the whole story? We find issues w/ redundancy & bias. 🤔
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025

April 17, 2025 at 4:12 PM

Reposted by Luke Marris

Marc Lanctot

@sharky6000.bsky.social

Working at the intersection of social choice and learning algorithms?

Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.

Submission deadline: May 9th.

I attended last year at AAMAS and loved it! 👍

sites.google.com/corp/view/sc...

SCaLA-25

A workshop connecting research topics in social choice and learning algorithms.

sites.google.com

March 26, 2025 at 8:18 PM

Reposted by Luke Marris

Jeff Dean

@jeffdean.bsky.social

🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇

March 25, 2025 at 5:25 PM

Reposted by Luke Marris

Marc Lanctot

@sharky6000.bsky.social

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N

February 24, 2025 at 3:25 PM

Luke Marris

@lukemarris.bsky.social

[🧵1/N] Please check out our new paper (arxiv.org/abs/2502.11645) on game-theoretic evaluation. It is the first method that results in clone-invariant ratings in N-player, general-sum interactions. Co-authors: @liusiqi.bsky.social , Ian Gemp, Georgios Piliouras, @sharky6000.bsky.social 🎉

Deviation Ratings: A General, Clone-Invariant Rating Method

Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These...

arxiv.org

February 18, 2025 at 10:49 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news