https://www.claudiashi.com/
We also developed a cool package for circuit testing: github.com/blei-lab/cir...
Find us at the NeurIPS Thursday poster session or at the bestest dim sum restaurant in Vancouver!
We also developed a cool package for circuit testing: github.com/blei-lab/cir...
Find us at the NeurIPS Thursday poster session or at the bestest dim sum restaurant in Vancouver!
We apply our tests to six benchmark circuits from the literature: two synthetic circuits, two semi-synthetic circuits (circuits discovered on toy transformer models), and two circuits in the wild (circuits discovered on transformer models such as GPT-2).
We apply our tests to six benchmark circuits from the literature: two synthetic circuits, two semi-synthetic circuits (circuits discovered on toy transformer models), and two circuits in the wild (circuits discovered on transformer models such as GPT-2).
Sufficiency Test: How faithful is faithful enough?
Partial Necessity Test: How much knockdown effect is significant?
Sufficiency Test: How faithful is faithful enough?
Partial Necessity Test: How much knockdown effect is significant?
Independence Test: Removing the circuit renders the model output independent of that of the circuit
Minimality Test: All edges in the circuit are necessary for the task
Independence Test: Removing the circuit renders the model output independent of that of the circuit
Minimality Test: All edges in the circuit are necessary for the task
Equivalence Test: The circuit and the original model have the same chance of outperforming each other
Equivalence Test: The circuit and the original model have the same chance of outperforming each other
1️⃣ Mechanism Preservation: The circuit should preserve the model's behavior
2️⃣ Localization: Removing the circuit disables the task
3️⃣ Minimality: The circuit contains no redundant parts
1️⃣ Mechanism Preservation: The circuit should preserve the model's behavior
2️⃣ Localization: Removing the circuit disables the task
3️⃣ Minimality: The circuit contains no redundant parts