Luke Marris
@lukemarris.bsky.social
Research Engineer at Google DeepMind.
Interests in game theory, reinforcement learning, and deep learning.
Website: https://www.lukemarris.info/
Google Scholar: https://scholar.google.com/citations?user=dvTeSX4AAAAJ
Interests in game theory, reinforcement learning, and deep learning.
Website: https://www.lukemarris.info/
Google Scholar: https://scholar.google.com/citations?user=dvTeSX4AAAAJ
Reposted by Luke Marris
Hello everyone 👋 Good news!
🚨 Our Game Theory & Multiagent Systems team at Google DeepMind is hiring! 🚨
.. and we have not one, but two open positions! One Research Scientist role and one Research Engineer role. 😁
Please repost and tell anyone who might be interested!
Details in thread below 👇
🚨 Our Game Theory & Multiagent Systems team at Google DeepMind is hiring! 🚨
.. and we have not one, but two open positions! One Research Scientist role and one Research Engineer role. 😁
Please repost and tell anyone who might be interested!
Details in thread below 👇
September 29, 2025 at 12:36 PM
Hello everyone 👋 Good news!
🚨 Our Game Theory & Multiagent Systems team at Google DeepMind is hiring! 🚨
.. and we have not one, but two open positions! One Research Scientist role and one Research Engineer role. 😁
Please repost and tell anyone who might be interested!
Details in thread below 👇
🚨 Our Game Theory & Multiagent Systems team at Google DeepMind is hiring! 🚨
.. and we have not one, but two open positions! One Research Scientist role and one Research Engineer role. 😁
Please repost and tell anyone who might be interested!
Details in thread below 👇
Our team is hiring REs (job-boards.greenhouse.io/deepmind/job...) and RSs (job-boards.greenhouse.io/deepmind/job...). Please apply if you are interested in game theory / multiagent.
Research Engineer, Game Theory & Multi-Agent Systems
London, UK
job-boards.greenhouse.io
September 29, 2025 at 8:29 AM
Our team is hiring REs (job-boards.greenhouse.io/deepmind/job...) and RSs (job-boards.greenhouse.io/deepmind/job...). Please apply if you are interested in game theory / multiagent.
Reposted by Luke Marris
Frontier models are often compared on crowdsourced user prompts - user prompts can be low-quality, biased and redundant, making "performance on average" hard to trust.
Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!
Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!
Re-evaluating Open-Ended Evaluation of Large Language Models
A case study using the livebench.ai leaderboard.
shorturl.at
April 18, 2025 at 4:34 PM
Frontier models are often compared on crowdsourced user prompts - user prompts can be low-quality, biased and redundant, making "performance on average" hard to trust.
Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!
Come find us at #ICLR2025 to discuss game-theoretic evaluation (shorturl.at/0QtBj)! See you in Singapore!
[🧵1/N] Thrilled to share our work "Re-evaluating Open-Ended Evaluation of Large Language Models"! 🚀 Popular LLM leaderboards (think Elo/Chatbot Arena) are useful, but are they telling the whole story? We find issues w/ redundancy & bias. 🤔
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025
April 17, 2025 at 4:12 PM
[🧵1/N] Thrilled to share our work "Re-evaluating Open-Ended Evaluation of Large Language Models"! 🚀 Popular LLM leaderboards (think Elo/Chatbot Arena) are useful, but are they telling the whole story? We find issues w/ redundancy & bias. 🤔
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025
Paper @ ICLR 2025: arxiv.org/abs/2502.20170 #LLM #ICLR2025
Reposted by Luke Marris
Working at the intersection of social choice and learning algorithms?
Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.
Submission deadline: May 9th.
I attended last year at AAMAS and loved it! 👍
sites.google.com/corp/view/sc...
Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.
Submission deadline: May 9th.
I attended last year at AAMAS and loved it! 👍
sites.google.com/corp/view/sc...
SCaLA-25
A workshop connecting research topics in social choice and learning algorithms.
sites.google.com
March 26, 2025 at 8:18 PM
Working at the intersection of social choice and learning algorithms?
Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.
Submission deadline: May 9th.
I attended last year at AAMAS and loved it! 👍
sites.google.com/corp/view/sc...
Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.
Submission deadline: May 9th.
I attended last year at AAMAS and loved it! 👍
sites.google.com/corp/view/sc...
Reposted by Luke Marris
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.
Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
March 25, 2025 at 5:25 PM
🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.
Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on the LM Arena leaderboard. 🥇
Reposted by Luke Marris
Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?
I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N
I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N
February 24, 2025 at 3:25 PM
Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?
I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N
I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N
[🧵1/N] Please check out our new paper (arxiv.org/abs/2502.11645) on game-theoretic evaluation. It is the first method that results in clone-invariant ratings in N-player, general-sum interactions. Co-authors: @liusiqi.bsky.social , Ian Gemp, Georgios Piliouras, @sharky6000.bsky.social 🎉
Deviation Ratings: A General, Clone-Invariant Rating Method
Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These...
arxiv.org
February 18, 2025 at 10:49 AM
[🧵1/N] Please check out our new paper (arxiv.org/abs/2502.11645) on game-theoretic evaluation. It is the first method that results in clone-invariant ratings in N-player, general-sum interactions. Co-authors: @liusiqi.bsky.social , Ian Gemp, Georgios Piliouras, @sharky6000.bsky.social 🎉