Gillian Hadfield
ghadfield.bsky.social
Gillian Hadfield
@ghadfield.bsky.social
Economist and legal scholar turned AI researcher focused on AI alignment and governance. Prof of government and policy and computer science at Johns Hopkins where I run the Normativity Lab. Recruiting CS postdocs and PhD students. gillianhadfield.org
Future work should focus on developing smarter debate protocols that weight expertise, discourage blind agreement, and reward critical verification of reasoning. We need to move beyond the naive assumption that 'more talk = better outcomes. (10/10) arxiv.org/abs/2509.05396
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate
While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively…
arxiv.org
September 23, 2025 at 5:06 PM
We suspect RLHF training creates sycophantic behavior, models trained to be agreeable may prioritize consensus over critical evaluation. This suggests current alignment techniques might undermine collaborative reasoning.
September 23, 2025 at 5:06 PM
Stronger agents were more likely to change from correct to incorrect answers in response to weaker agents' reasoning than vice versa. Models showed a tendency toward favoring agreement over critical evaluation, creating an echo chamber instead of an actual debate.
September 23, 2025 at 5:06 PM
However, we still observed performance gains on math problems under most conditions, suggesting debate effectiveness depends heavily on the type of reasoning required.
September 23, 2025 at 5:06 PM
The impact varies significantly by task type. On CommonSenseQA—a dataset we newly examined—debate reduced performance across ALL experimental conditions.
September 23, 2025 at 5:06 PM
Even when stronger models outweighed weaker ones, group accuracy decreased over successive debate rounds. Introducing weaker models into debates produced results that were worse than the results when agents hadn’t engaged in discussion at all.
September 23, 2025 at 5:06 PM
We tested debate effectiveness across three tasks (CommonSenseQA, MMLU, GSM8K) using three different models (GPT-4o-mini, LLaMA-3.1-8B, Mistral-7B) in various configurations.
September 23, 2025 at 5:06 PM
We found that multi-agent debate among large language models can sometimes harm performance rather than improve it, contradicting the assumption that more discussion can lead to better outcomes.
September 23, 2025 at 5:06 PM
My lab members Harsh Satija and Andrea Wynn and I have a new preprint examining AI multi-agent debate among diverse models, based on our ICML MAS 2025 workshop.
September 23, 2025 at 5:06 PM
These roles will shape the conversation on AI and provide the opportunity for rich, interdisciplinary collaboration with colleagues and researchers in the Department of Computer Science and the School of Government and Policy.
Please spread the word in your network! 5/5
gillianhadfield.org/jobs/
Jobs
I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…
gillianhadfield.org
June 16, 2025 at 6:18 PM
We're recruiting for a Postdoctoral fellow with a track record in computational modeling that explores AI systems and autonomous AI agent dynamics, and experience with ML systems to investigate the foundations of human normativity, and how to build AI systems aligned with human values. 4/5
June 16, 2025 at 6:17 PM
We're hiring an AI Communications Associate to craft and execute a multi-channel strategy that turns leading computer science and public policy research into accessible content for a broad audience of stakeholders. 3/5
June 16, 2025 at 6:16 PM
We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy challenges in AI alignment, safety, and governance, and to produce high-quality research reports, white papers, and policy recommendations. 2/5
Jobs
I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track re…
gillianhadfield.org
June 16, 2025 at 6:15 PM
Our report is now out, chock-a-block with new ideas including insurance partnerships, government oversight of private regulators, building a robust ecosystem, and fostering trust and investment. Check it out here: srinstitute.utoronto.ca/news/co-desi...
Can a market-based regulatory framework help govern AI? New report weighs in — Schwartz Reisman Institute
In April 2024, the Schwartz Reisman Institute for Technology and Society (SRI) hosted a workshop that brought together 33 high-level experts to explore the viability of regulatory markets. Over the co...
srinstitute.utoronto.ca
June 12, 2025 at 12:34 AM
destabilize or harm our communities, economies, or politics.Together with @djjrjr.bsky.social and @torontosri.bsky.social we held a design workshop last year with a stunning group of experts from AI labs, regulatory technology startups, enterprise clients, civil society, academia,and government.2/3
June 12, 2025 at 12:33 AM