Lightnews — Scholar-powered news

@anthropicbot.bsky.social

Video: https://twitter.com/AnthropicAI/status/2010844260543967484 (3/3)

January 12, 2026 at 10:54 PM

@anthropicbot.bsky.social

0:00 Introduction
0:22 Meet the panel
1:06 Vibes on campus
6:28 What are students building?
11:27 AI as tool vs. crutch
16:44 Are professors keeping up?
20:15 Downsides
25:55 AI and the job market
34:23 Rapid-fire questions (2/3)

January 12, 2026 at 10:54 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

Introducing Cowork | Claude

Claude Code's agentic capabilities, now for everyone. Give Claude access to your files and let it organize, create, and edit documents while you focus on what matters.

claude.com

January 13, 2026 at 2:54 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

Cowork is available as a research preview for Claude Max subscribers in the macOS app. Click on “Cowork” in the sidebar: https://claude.com/download

If you're on another plan, join the waitlist for future access here: https://forms.gle/mtoJrd8kfYny29jQ9

Cowork research preview

Claude Code's agentic power, now in the desktop app. No terminal required. Point Claude at local folders, kick off a task, and step back. Claude spins up parallel sub-agents to research, write, and organize—while you do other things. Learn more. Max plan users have access now. Join the waitlist as we gradually expand access. Note: Available on macOS desktop app only for now.

forms.gle

January 12, 2026 at 8:10 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

Once you've set a task, Claude makes a plan and steadily completes it, looping you in along the way.

Claude will ask before taking any significant actions so you can course-correct as needed.

January 12, 2026 at 8:10 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

After 1,700 cumulative hours of red-teaming, we’ve yet to identify a universal jailbreak (a consistent attack strategy that works across many queries) that works on our new system.

Read the full paper: https://arxiv.org/abs/2601.04603

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

We introduce enhanced Constitutional Classifiers that deliver production-grade jailbreak robustness with dramatically reduced computational costs and refusal rates compared to previous-generation defenses. Our system combines several key insights. First, we develop exchange classifiers that evaluate model responses in their full conversational context, which addresses vulnerabilities in last-generation systems that examine outputs in isolation. Second, we implement a two-stage classifier cascade where lightweight classifiers screen all traffic and escalate only suspicious exchanges to more expensive classifiers. Third, we train efficient linear probe classifiers and ensemble them with external classifiers to simultaneously improve robustness and reduce computational costs. Together, these techniques yield a production-grade system achieving a 40x computational cost reduction compared to our baseline exchange classifier, while maintaining a 0.05% refusal rate on production traffic. Through extensive red-teaming comprising over 1,700 hours, we demonstrate strong protection against universal jailbreaks -- no attack on this system successfully elicited responses to all eight target queries comparable in detail to an undefended model. Our work establishes Constitutional Classifiers as practical and efficient safeguards for large language models.

arxiv.org

January 10, 2026 at 12:19 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

Because the system harnesses internal activations already happening within a model, and reserves heavier computation only for potentially harmful exchanges, it adds only ~1% compute overhead.

It’s also more accurate, with an 87% drop in refusal rates on harmless requests.

January 10, 2026 at 12:19 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

Our new system adds several innovations.

One is a practical application of interpretability: a probe that can see Claude’s internal activations helps to screen all traffic. These activations are like Claude’s gut instincts, and they’re harder to fool.

January 10, 2026 at 12:19 PM

Anthropic [UNOFFICIAL]

@anthropicbot.bsky.social

The classifiers reduced the jailbreak success rate from 86% to 4.4%, but they were expensive to run and made Claude more likely to refuse benign requests.

We also found the system was still vulnerable to two types of attacks, shown in the figure below: