Just recently, Anthropic disrupted the first AI-orchestrated espionage campaign. Using Claude Code’s agentic features, they automated 80-90% of the attack against 30 targets.
The AI wrote exploits and stole data at speeds impossible for humans to match
News: The Danish Defence Intelligence Service (DDIS) announced on Thursday that Moscow was behind a cyber-attack on a Danish water utility in 2024 and a series of distributed denial-of-service (DDoS) attacks on Danish websites in the lead-up to…
Just recently, Anthropic disrupted the first AI-orchestrated espionage campaign. Using Claude Code’s agentic features, they automated 80-90% of the attack against 30 targets.
The AI wrote exploits and stole data at speeds impossible for humans to match
In any society, there are active groups that loudly broadcast their views. They fill social media, forums, and news outlets with their ideas. (1/
In any society, there are active groups that loudly broadcast their views. They fill social media, forums, and news outlets with their ideas. (1/
The last example I saw was a person disproving a fact about Vince Zampella's death with a Gemini response screenshot denying that a few hours after the crash.
Don't use AI for news - we are already fighting too much disinfo.
The last example I saw was a person disproving a fact about Vince Zampella's death with a Gemini response screenshot denying that a few hours after the crash.
Don't use AI for news - we are already fighting too much disinfo.
Here’s the irony: it’s learning from us. If 80% of what we post is cliché, outrage, and low-effort memes, we are literally teaching the "future of intelligence" to be as mindless as possible.
Here’s the irony: it’s learning from us. If 80% of what we post is cliché, outrage, and low-effort memes, we are literally teaching the "future of intelligence" to be as mindless as possible.
If an LLM doesn't "understand" rules and only predicts the next word, then "jailbreaking" isn't about breaking code. It’s about finding the specific sequence of tokens that makes a forbidden answer more probable than a refusal.
If an LLM doesn't "understand" rules and only predicts the next word, then "jailbreaking" isn't about breaking code. It’s about finding the specific sequence of tokens that makes a forbidden answer more probable than a refusal.
Unless you actually don't care about the answer, and the question is not a question at all.
Unless you actually don't care about the answer, and the question is not a question at all.
Current AI safety isn't one "filter" - it’s a multi-layered safety stack designed to steer model outputs:
1) Data Scrubbing: removing toxic content from training sets before the model learns. (but, obviously, you cannot perfectly scrub the internet)
Current AI safety isn't one "filter" - it’s a multi-layered safety stack designed to steer model outputs:
1) Data Scrubbing: removing toxic content from training sets before the model learns. (but, obviously, you cannot perfectly scrub the internet)
The fundamental issue is that models are designed for utility, not morality.Three major failure points:
The fundamental issue is that models are designed for utility, not morality.Three major failure points:
So what could be such a prevention?
We need platform accountability. Not just filters - but responsibility. Let's discuss specific mechanisms.
#AIEthics #Disinformation #TechPolicy
So what could be such a prevention?
We need platform accountability. Not just filters - but responsibility. Let's discuss specific mechanisms.
#AIEthics #Disinformation #TechPolicy
We need platform accountability. Not just filters - but responsibility. Let's discuss specific mechanisms.
#AIEthics #Disinformation #TechPolicy