Dreadnode
@dreadnode.bsky.social
Building AI systems that advance the state of offensive security | https://www.dreadnode.io/
Dreadnode is a proud sponsor of @sentinelone.com's #labscon25!
Heading to Scottsdale this week? Catch @machinavelli.com and Brad Palm's talk, Auto-Poking the Bear—Analytical Tradecraft in the AI Age, on Thursday at 2pm MT.
Or, shoot us a DM to find time to meet up onsite!
Heading to Scottsdale this week? Catch @machinavelli.com and Brad Palm's talk, Auto-Poking the Bear—Analytical Tradecraft in the AI Age, on Thursday at 2pm MT.
Or, shoot us a DM to find time to meet up onsite!
September 16, 2025 at 5:57 PM
Dreadnode is a proud sponsor of @sentinelone.com's #labscon25!
Heading to Scottsdale this week? Catch @machinavelli.com and Brad Palm's talk, Auto-Poking the Bear—Analytical Tradecraft in the AI Age, on Thursday at 2pm MT.
Or, shoot us a DM to find time to meet up onsite!
Heading to Scottsdale this week? Catch @machinavelli.com and Brad Palm's talk, Auto-Poking the Bear—Analytical Tradecraft in the AI Age, on Thursday at 2pm MT.
Or, shoot us a DM to find time to meet up onsite!
Incoming: Dreadnode paper drop from Shane Caldwell and the crew.
PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921
Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).
PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921
Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).
August 6, 2025 at 6:31 PM
Incoming: Dreadnode paper drop from Shane Caldwell and the crew.
PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921
Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).
PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921
Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).
Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities.
Read the paper on arXiv: arxiv.org/abs/2506.14682
Open-source dataset and benchmark eval code repo: github.com/dreadnode/AI...
Read the paper on arXiv: arxiv.org/abs/2506.14682
Open-source dataset and benchmark eval code repo: github.com/dreadnode/AI...
June 18, 2025 at 1:24 PM
Introducing AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI/ML security vulnerabilities.
Read the paper on arXiv: arxiv.org/abs/2506.14682
Open-source dataset and benchmark eval code repo: github.com/dreadnode/AI...
Read the paper on arXiv: arxiv.org/abs/2506.14682
Open-source dataset and benchmark eval code repo: github.com/dreadnode/AI...
Introducing our new blog series: "From Compute to Congress: Decoding AI Policy" by Dreadnode Head of Policy Daria Bahrami | Read the first post here: dreadnode.io/blog/from-co...
May 15, 2025 at 4:50 PM
Introducing our new blog series: "From Compute to Congress: Decoding AI Policy" by Dreadnode Head of Policy Daria Bahrami | Read the first post here: dreadnode.io/blog/from-co...
Are manual or automated attacks more effective when attacking LLMs?
We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%).
More insights on LLM attack execution methods here 👉 dreadnode.io/blog/the-aut...
We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%).
More insights on LLM attack execution methods here 👉 dreadnode.io/blog/the-aut...
May 8, 2025 at 3:30 PM
Are manual or automated attacks more effective when attacking LLMs?
We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%).
More insights on LLM attack execution methods here 👉 dreadnode.io/blog/the-aut...
We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%).
More insights on LLM attack execution methods here 👉 dreadnode.io/blog/the-aut...
May 1, 2025 at 7:50 PM
What's your take on the growing dominance of automated attacks and the implications for AI red teams? Here's ours— based on our analysis of 30 LLM challenges, attempted by 1,674 unique Crucible users, across 214,271 attack attempts: arxiv.org/abs/2504.19855
April 29, 2025 at 4:15 PM
What's your take on the growing dominance of automated attacks and the implications for AI red teams? Here's ours— based on our analysis of 30 LLM challenges, attempted by 1,674 unique Crucible users, across 214,271 attack attempts: arxiv.org/abs/2504.19855
Headed to RSA? Come meet the Dreadnode crew!
Whether you're looking for a private deep dive into our tech or want to hang out and talk offensive AI research, we'd love to connect.
Limited availability; Come and get it: calendly.com/tori-dreadno...
#BayArea #SanFrancisco #RSAC2025 #OffensiveAI
Whether you're looking for a private deep dive into our tech or want to hang out and talk offensive AI research, we'd love to connect.
Limited availability; Come and get it: calendly.com/tori-dreadno...
#BayArea #SanFrancisco #RSAC2025 #OffensiveAI
April 16, 2025 at 4:12 PM
Headed to RSA? Come meet the Dreadnode crew!
Whether you're looking for a private deep dive into our tech or want to hang out and talk offensive AI research, we'd love to connect.
Limited availability; Come and get it: calendly.com/tori-dreadno...
#BayArea #SanFrancisco #RSAC2025 #OffensiveAI
Whether you're looking for a private deep dive into our tech or want to hang out and talk offensive AI research, we'd love to connect.
Limited availability; Come and get it: calendly.com/tori-dreadno...
#BayArea #SanFrancisco #RSAC2025 #OffensiveAI
🌭🔪⚾️🦥🔥🔄🤨🛜
8 new Challenges now live in Crucible: platform.dreadnode.io/crucible
These Challenges might look familiar… they first appeared at DEFCON 30 and were recently refactored for Crucible—enjoy! [Filter>Subject>DEFCON-30]
8 new Challenges now live in Crucible: platform.dreadnode.io/crucible
These Challenges might look familiar… they first appeared at DEFCON 30 and were recently refactored for Crucible—enjoy! [Filter>Subject>DEFCON-30]
March 26, 2025 at 8:02 PM
🌭🔪⚾️🦥🔥🔄🤨🛜
8 new Challenges now live in Crucible: platform.dreadnode.io/crucible
These Challenges might look familiar… they first appeared at DEFCON 30 and were recently refactored for Crucible—enjoy! [Filter>Subject>DEFCON-30]
8 new Challenges now live in Crucible: platform.dreadnode.io/crucible
These Challenges might look familiar… they first appeared at DEFCON 30 and were recently refactored for Crucible—enjoy! [Filter>Subject>DEFCON-30]
Cheque, check, one-two. We have a new Crucible Challenge for you: Phantom Cheque! Can you evade the cheque scanner and determine the areas of JagaLLM that need to be improved?
Act fast; first three to solve this model extraction Challenge announced Friday: platform.dreadnode.io/crucible/pha...
Act fast; first three to solve this model extraction Challenge announced Friday: platform.dreadnode.io/crucible/pha...
March 11, 2025 at 8:11 PM
Cheque, check, one-two. We have a new Crucible Challenge for you: Phantom Cheque! Can you evade the cheque scanner and determine the areas of JagaLLM that need to be improved?
Act fast; first three to solve this model extraction Challenge announced Friday: platform.dreadnode.io/crucible/pha...
Act fast; first three to solve this model extraction Challenge announced Friday: platform.dreadnode.io/crucible/pha...
In this week's new Crucible Challenge, find the hidden phrase in the backdoored model using dyana, an open source tool created by Dreadnode's Ads Dawson.
Can you outwit the llamas? platform.dreadnode.io/crucible/dya...
Can you outwit the llamas? platform.dreadnode.io/crucible/dya...
March 4, 2025 at 5:04 PM
In this week's new Crucible Challenge, find the hidden phrase in the backdoored model using dyana, an open source tool created by Dreadnode's Ads Dawson.
Can you outwit the llamas? platform.dreadnode.io/crucible/dya...
Can you outwit the llamas? platform.dreadnode.io/crucible/dya...
Raiders of the Lost AI: Attempt our new Crucible Challenge, Palimpsest! Decode the hidden message in the scroll, find the flag.
First three to solve will be announced Friday, right here.
Get started: crucible.dreadnode.io/challenges/p...
First three to solve will be announced Friday, right here.
Get started: crucible.dreadnode.io/challenges/p...
February 18, 2025 at 9:49 PM
Raiders of the Lost AI: Attempt our new Crucible Challenge, Palimpsest! Decode the hidden message in the scroll, find the flag.
First three to solve will be announced Friday, right here.
Get started: crucible.dreadnode.io/challenges/p...
First three to solve will be announced Friday, right here.
Get started: crucible.dreadnode.io/challenges/p...
Boo! 👻 In our new Crucible Challenge, Popcorn, an LLM firewall is blocking access to a protected SQL table. Can you unmask the secret info?
First-to-solve announced Friday. Get started: crucible.dreadnode.io/challenges/p...
First-to-solve announced Friday. Get started: crucible.dreadnode.io/challenges/p...
February 11, 2025 at 6:31 PM
Boo! 👻 In our new Crucible Challenge, Popcorn, an LLM firewall is blocking access to a protected SQL table. Can you unmask the secret info?
First-to-solve announced Friday. Get started: crucible.dreadnode.io/challenges/p...
First-to-solve announced Friday. Get started: crucible.dreadnode.io/challenges/p...
Another week, another new Crucible Challenge.
Shoutout to these three for being the first to solve our reasoning model Challenge, DeepTweak!
Get your tweak on: crucible.dreadnode.io/challenges/d...
Shoutout to these three for being the first to solve our reasoning model Challenge, DeepTweak!
Get your tweak on: crucible.dreadnode.io/challenges/d...
February 7, 2025 at 8:43 PM
Another week, another new Crucible Challenge.
Shoutout to these three for being the first to solve our reasoning model Challenge, DeepTweak!
Get your tweak on: crucible.dreadnode.io/challenges/d...
Shoutout to these three for being the first to solve our reasoning model Challenge, DeepTweak!
Get your tweak on: crucible.dreadnode.io/challenges/d...
February 6, 2025 at 7:09 PM
NEW Crucible Challenge: DeepTweak, an exploration of reasoning model behavior. Cause enough confusion 😵💫, retrieve the flag.
Think fast; The first three users to solve DeepTweak will be announced Friday!
➡️ https://crucible.dreadnode.io/challenges/deeptweak?utm_source=social&utm_medium=social&u…
Think fast; The first three users to solve DeepTweak will be announced Friday!
➡️ https://crucible.dreadnode.io/challenges/deeptweak?utm_source=social&utm_medium=social&u…
February 4, 2025 at 5:36 PM
NEW Crucible Challenge: DeepTweak, an exploration of reasoning model behavior. Cause enough confusion 😵💫, retrieve the flag.
Think fast; The first three users to solve DeepTweak will be announced Friday!
➡️ https://crucible.dreadnode.io/challenges/deeptweak?utm_source=social&utm_medium=social&u…
Think fast; The first three users to solve DeepTweak will be announced Friday!
➡️ https://crucible.dreadnode.io/challenges/deeptweak?utm_source=social&utm_medium=social&u…
Congrats to these hosers for being the first three to solve the canadianeh challenge in Crucible! Tune in Tuesday for the next drop 👀
ICYMI, give canadianeh a try: crucible.dreadnode.io/challenges/c...
ICYMI, give canadianeh a try: crucible.dreadnode.io/challenges/c...
January 31, 2025 at 7:09 PM
Congrats to these hosers for being the first three to solve the canadianeh challenge in Crucible! Tune in Tuesday for the next drop 👀
ICYMI, give canadianeh a try: crucible.dreadnode.io/challenges/c...
ICYMI, give canadianeh a try: crucible.dreadnode.io/challenges/c...
Don't be a hozer eh. It's aboot time you started taking model security seriously. Head to Crucible to attempt our new Challenge, canadianeh.
Can you be the first to solve it? Check back here Friday.
Happy hacking: https://buff.ly/4gn4hHP
Can you be the first to solve it? Check back here Friday.
Happy hacking: https://buff.ly/4gn4hHP
January 28, 2025 at 5:19 PM
Don't be a hozer eh. It's aboot time you started taking model security seriously. Head to Crucible to attempt our new Challenge, canadianeh.
Can you be the first to solve it? Check back here Friday.
Happy hacking: https://buff.ly/4gn4hHP
Can you be the first to solve it? Check back here Friday.
Happy hacking: https://buff.ly/4gn4hHP
Where in the world is Dreadnode? Catch our founders @moohax.bsky.social and Nick Landers at these upcoming AI security events:
💻 NEBULA:FOG:PRIME Hackathon (Saturday, January 25)
🇫🇷 Paris AI Security Forum 2025 (Sunday, February 9)
Shoot us a DM to link up!
💻 NEBULA:FOG:PRIME Hackathon (Saturday, January 25)
🇫🇷 Paris AI Security Forum 2025 (Sunday, February 9)
Shoot us a DM to link up!
January 23, 2025 at 5:48 PM
Where in the world is Dreadnode? Catch our founders @moohax.bsky.social and Nick Landers at these upcoming AI security events:
💻 NEBULA:FOG:PRIME Hackathon (Saturday, January 25)
🇫🇷 Paris AI Security Forum 2025 (Sunday, February 9)
Shoot us a DM to link up!
💻 NEBULA:FOG:PRIME Hackathon (Saturday, January 25)
🇫🇷 Paris AI Security Forum 2025 (Sunday, February 9)
Shoot us a DM to link up!
NEW open source tool from Dreadnode's Simone Margaritelli and @radads.bsky.social: dyana, an eBFP sandbox environment designed to load, run, and profile a wide range of files and provide dynamic testing for AI models.
You know the drill - try it out: github.com/dreadnode/dy...
You know the drill - try it out: github.com/dreadnode/dy...
January 14, 2025 at 5:28 PM
NEW open source tool from Dreadnode's Simone Margaritelli and @radads.bsky.social: dyana, an eBFP sandbox environment designed to load, run, and profile a wide range of files and provide dynamic testing for AI models.
You know the drill - try it out: github.com/dreadnode/dy...
You know the drill - try it out: github.com/dreadnode/dy...
Check out v0.4.0 of robopages! 🤖
New updates from Simone Margaritelli (@evilsocket) include: Support for executing commands on another host via SSH, easier integration into CI workflows, support for shared environment variables, and integrations with 13 new tools.
—> https://buff.ly/3VDDGPd
New updates from Simone Margaritelli (@evilsocket) include: Support for executing commands on another host via SSH, easier integration into CI workflows, support for shared environment variables, and integrations with 13 new tools.
—> https://buff.ly/3VDDGPd
December 17, 2024 at 5:20 PM
Check out v0.4.0 of robopages! 🤖
New updates from Simone Margaritelli (@evilsocket) include: Support for executing commands on another host via SSH, easier integration into CI workflows, support for shared environment variables, and integrations with 13 new tools.
—> https://buff.ly/3VDDGPd
New updates from Simone Margaritelli (@evilsocket) include: Support for executing commands on another host via SSH, easier integration into CI workflows, support for shared environment variables, and integrations with 13 new tools.
—> https://buff.ly/3VDDGPd
Dreadnode’s U.S. team: *Celebrates Thanksgiving*
Our resident Canadian:
Our resident Canadian:
December 2, 2024 at 6:53 PM
Dreadnode’s U.S. team: *Celebrates Thanksgiving*
Our resident Canadian:
Our resident Canadian: