Favorite papers at http://aisafetyfrontier.substack.com. Opinions my own.
Read the full blog post: alignment.anthropic.com/2025/automat...
Read the full blog post: alignment.anthropic.com/2025/automat...
1) Experiment sandbagging: deliberately “acting dumb” on safety-critical ML research;
2) Research decision steering: manipulating arguments to favor certain ML solutions.
Both could potentially slow down safety-critical AI research.
1) Experiment sandbagging: deliberately “acting dumb” on safety-critical ML research;
2) Research decision steering: manipulating arguments to favor certain ML solutions.
Both could potentially slow down safety-critical AI research.