Lightnews — Scholar-powered news

Sian Gooding

@siangooding.bsky.social

280 followers 20 following 15 posts

Senior Research Scientist @GoogleDeepMind working on Autonomous Assistants ✍️🤖

Posts Replies Media Videos

Sian Gooding

@siangooding.bsky.social

Sorted, thanks!

April 2, 2025 at 10:07 PM

Sian Gooding

@siangooding.bsky.social

You’ll collaborate with a kind, curious, research-driven team—including the brilliant @joao.omg.lol & @martinklissarov.bsky.social —and get to shape work at the frontier of multi-agent learning.

If that sounds like you, apply!

DM me if you're curious or have questions

April 2, 2025 at 9:57 AM

Sian Gooding

@siangooding.bsky.social

Some big questions we’re thinking about:
1⃣How do communication protocols emerge?
2⃣What inductive biases help coordination?
3⃣How can language improve generalisation and transfer?

April 2, 2025 at 9:57 AM

Sian Gooding

@siangooding.bsky.social

We’re interested in:
🤖🤖 Multi-agent RL
🔠 Emergent language
🎲 Communication games
🧠 Social & cognitive modelling
📈 Scaling laws for coordination

April 2, 2025 at 9:57 AM

Sian Gooding

@siangooding.bsky.social

The project explores how agents can learn to communicate and coordinate in complex, open-ended environments—through emergent protocols, not hand-coded rules.

April 2, 2025 at 9:57 AM

Sian Gooding

@siangooding.bsky.social

Our full paper:
arxiv.org/pdf/2503.19711

arxiv.org

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Our work highlights the need for LLMs to improve in areas like action selection, self-evaluation + goal alignment to perform robustly in open-ended tasks

Implications of this work extend beyond writing assistance to autonomous workflows for LLMs in general open-ended use cases

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Finding: LLMs can lose track of the original goal during iterative refinement, leading to "semantic drift" - a divergence from the author's intent. This is a key challenge for autonomous revision. ✍️

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Finding: LLMs struggle to reliably filter their own suggestions. They need better self-evaluation to work effectively in autonomous revision workflows. ⚖️

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Finding: Gemini 1.5 Pro produced the highest quality editing suggestions, according to human evaluators, outperforming Claude 3.5 Sonnet and GPT-4o 🦾

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Finding: LLMs tend to favour adding content, whereas human editors remove or restructure more. This suggests LLMs are sycophantic, reinforcing existing text rather than critically evaluating it. ➕

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Why? There are many possible solutions and no single 'right' answer. Success is difficult to gauge!

We examine how LLMs generate + select text revisions, comparing their actions to human editors. We focus on action diversity, alignment with human prefs, and iterative improvement

April 2, 2025 at 9:51 AM

Sian Gooding

@siangooding.bsky.social

Our paper explores this by analysing LLMs as autonomous co-writers. Work done with Lucia Lopez Rivilla, @egrefen.bsky.social ) 🫶

Open-ended tasks like writing are a real challenge for LLMs (even powerful ones like Gemini 1.5 Pro, Claude 3.5 Sonnet, and GPT-4o).

April 2, 2025 at 9:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news