Sian Gooding
banner
siangooding.bsky.social
Sian Gooding
@siangooding.bsky.social
Senior Research Scientist @GoogleDeepMind working on Autonomous Assistants ✍️🤖
Sorted, thanks!
April 2, 2025 at 10:07 PM
You’ll collaborate with a kind, curious, research-driven team—including the brilliant @joao.omg.lol & @martinklissarov.bsky.social —and get to shape work at the frontier of multi-agent learning.

If that sounds like you, apply!

DM me if you're curious or have questions
April 2, 2025 at 9:57 AM
Some big questions we’re thinking about:
1⃣How do communication protocols emerge?
2⃣What inductive biases help coordination?
3⃣How can language improve generalisation and transfer?
April 2, 2025 at 9:57 AM
We’re interested in:
🤖🤖 Multi-agent RL
🔠 Emergent language
🎲 Communication games
🧠 Social & cognitive modelling
📈 Scaling laws for coordination
April 2, 2025 at 9:57 AM
The project explores how agents can learn to communicate and coordinate in complex, open-ended environments—through emergent protocols, not hand-coded rules.
April 2, 2025 at 9:57 AM
arxiv.org
April 2, 2025 at 9:51 AM
Our work highlights the need for LLMs to improve in areas like action selection, self-evaluation + goal alignment to perform robustly in open-ended tasks

Implications of this work extend beyond writing assistance to autonomous workflows for LLMs in general open-ended use cases
April 2, 2025 at 9:51 AM
Finding: LLMs can lose track of the original goal during iterative refinement, leading to "semantic drift" - a divergence from the author's intent. This is a key challenge for autonomous revision. ✍️
April 2, 2025 at 9:51 AM
Finding: LLMs struggle to reliably filter their own suggestions. They need better self-evaluation to work effectively in autonomous revision workflows. ⚖️
April 2, 2025 at 9:51 AM
Finding: Gemini 1.5 Pro produced the highest quality editing suggestions, according to human evaluators, outperforming Claude 3.5 Sonnet and GPT-4o 🦾
April 2, 2025 at 9:51 AM
Finding: LLMs tend to favour adding content, whereas human editors remove or restructure more. This suggests LLMs are sycophantic, reinforcing existing text rather than critically evaluating it. ➕
April 2, 2025 at 9:51 AM
Why? There are many possible solutions and no single 'right' answer. Success is difficult to gauge!

We examine how LLMs generate + select text revisions, comparing their actions to human editors. We focus on action diversity, alignment with human prefs, and iterative improvement
April 2, 2025 at 9:51 AM
Our paper explores this by analysing LLMs as autonomous co-writers. Work done with Lucia Lopez Rivilla, @egrefen.bsky.social ) 🫶

Open-ended tasks like writing are a real challenge for LLMs (even powerful ones like Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o).
April 2, 2025 at 9:51 AM