1) Quickly it's clear that models are sneaking in reasoning without verbalising where it comes from (e.g. making an equation that gets the correct answer, but defined out of thin air)
1) Quickly it's clear that models are sneaking in reasoning without verbalising where it comes from (e.g. making an equation that gets the correct answer, but defined out of thin air)
medium.com/people-ai-re...
medium.com/people-ai-re...
We are releasing SAE Bench, a suite of 8 SAE evaluations!
Project co-led with Adam Karvonen.
We are releasing SAE Bench, a suite of 8 SAE evaluations!
Project co-led with Adam Karvonen.
This work provides a battery of automated metrics that should help researchers understand their SAEs' strengths and weaknesses better: www.neuronpedia.org/sae-bench/info
This work provides a battery of automated metrics that should help researchers understand their SAEs' strengths and weaknesses better: www.neuronpedia.org/sae-bench/info