"We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems."
0:22 Meet the panel
1:06 Vibes on campus
6:28 What are students building?
11:27 AI as tool vs. crutch
16:44 Are professors keeping up?
20:15 Downsides
25:55 AI and the job market
34:23 Rapid-fire questions (2/3)
0:22 Meet the panel
1:06 Vibes on campus
6:28 What are students building?
11:27 AI as tool vs. crutch
16:44 Are professors keeping up?
20:15 Downsides
25:55 AI and the job market
34:23 Rapid-fire questions (2/3)
If you're on another plan, join the waitlist for future access here: https://forms.gle/mtoJrd8kfYny29jQ9
If you're on another plan, join the waitlist for future access here: https://forms.gle/mtoJrd8kfYny29jQ9
Claude will ask before taking any significant actions so you can course-correct as needed.
Claude will ask before taking any significant actions so you can course-correct as needed.
Read the full paper: https://arxiv.org/abs/2601.04603
Read the full paper: https://arxiv.org/abs/2601.04603
It’s also more accurate, with an 87% drop in refusal rates on harmless requests.
It’s also more accurate, with an 87% drop in refusal rates on harmless requests.
One is a practical application of interpretability: a probe that can see Claude’s internal activations helps to screen all traffic. These activations are like Claude’s gut instincts, and they’re harder to fool.
One is a practical application of interpretability: a probe that can see Claude’s internal activations helps to screen all traffic. These activations are like Claude’s gut instincts, and they’re harder to fool.
We also found the system was still vulnerable to two types of attacks, shown in the figure below:
We also found the system was still vulnerable to two types of attacks, shown in the figure below: