Kyle O’Brien
banner
kyletokens.bsky.social
Kyle O’Brien
@kyletokens.bsky.social
studying the minds on our computers | https://kyobrien.io
Author here! Data filtering is resistant to tampering, but not fully robust. We expect that a high-resource attacker can still teach the model the filtered knowledge. Our work is a significant improvement over the baselines, but far more work is needed.
August 15, 2025 at 2:20 PM