Lightnews — Scholar-powered news

Arthur Conmy

@arthurconmy.bsky.social

1.1K followers 470 following 9 posts

Aspiring 10x reverse engineer at Google DeepMind

Posts Replies Media Videos

Arthur Conmy

@arthurconmy.bsky.social

Been really enjoying unfaithful CoT research with collaborators recently. Two observations:

1) Quickly it's clear that models are sneaking in reasoning without verbalising where it comes from (e.g. making an equation that gets the correct answer, but defined out of thin air)

January 2, 2025 at 8:48 PM

Reposted by Arthur Conmy

Tyler Chang

@tylerachang.bsky.social

We scaled training data attribution (TDA) methods ~1000x to find influential pretraining examples for thousands of queries in an 8B-parameter LLM over the entire 160B-token C4 corpus!
medium.com/people-ai-re...

December 13, 2024 at 6:57 PM

Reposted by Arthur Conmy

Can

@canrager.bsky.social

Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making progress? The field has relied on imperfect proxy metrics.

We are releasing SAE Bench, a suite of 8 SAE evaluations!

Project co-led with Adam Karvonen.

December 11, 2024 at 6:07 AM

Arthur Conmy

@arthurconmy.bsky.social

Currently, most of our SAE evaluations don't fully capture what we want, and are very expensive

This work provides a battery of automated metrics that should help researchers understand their SAEs' strengths and weaknesses better: www.neuronpedia.org/sae-bench/info

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders - Dec 2024

www.neuronpedia.org

December 10, 2024 at 7:44 PM

Arthur Conmy

@arthurconmy.bsky.social

Please report @deep-mind.bsky.social: it is not the real DeepMind, which would obviously use the google.com or deepmind.google domains rather than one they bought off GoDaddy 3 days ago: uk.godaddy.com/whois/result...

November 26, 2024 at 12:43 AM

Arthur Conmy

@arthurconmy.bsky.social

I'm very bullish on automated research engineering soon, but even I was surprised that AI agents are twice as good as humans with 5+ years of experience or from a top AGI or safety lab at doing tasks in 2 hours. Paper: metr.org/AI_R_D_Evalu...

metr.org

November 22, 2024 at 10:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news