Lightnews — Scholar-powered news

Reposted

Dreadnode

@dreadnode.bsky.social

What's your take on the growing dominance of automated attacks and the implications for AI red teams? Here's ours— based on our analysis of 30 LLM challenges, attempted by 1,674 unique Crucible users, across 214,271 attack attempts: arxiv.org/abs/2504.19855

April 29, 2025 at 4:15 PM

Reposted

Dreadnode

@dreadnode.bsky.social

@datasociety.bsky.social and the AI Risk and Vulnerability Alliance just released “Red Teaming in the Public Interest,” a report examining how red teaming methods are being adapted to evaluate genAI.

Read the report, featuring commentary from @moohax.bsky.social: datasociety.net/library/red-...

Red-Teaming in the Public Interest

This report offers a vision for red-teaming in the public interest: a process that goes beyond system-centric testing of already built systems to consider the full range of ways the public can be invo...

datasociety.net

February 13, 2025 at 6:50 PM

Reposted

Dreadnode

@dreadnode.bsky.social

NEW Crucible Challenge: DeepTweak, an exploration of reasoning model behavior. Cause enough confusion 😵‍💫, retrieve the flag.

Think fast; The first three users to solve DeepTweak will be announced Friday!

➡️ https://crucible.dreadnode.io/challenges/deeptweak?utm_source=social&utm_medium=social&u…

February 4, 2025 at 5:36 PM

Reposted

Dreadnode

@dreadnode.bsky.social

New to Rigging:

🔥 Tracing
🛠️ API Tools
💻 HTTP Generator
🐍 Prompts as Tools

→ github.com/dreadnode/ri...

February 6, 2025 at 7:09 PM

moohax.bsky.social

@moohax.bsky.social

First distillation/extraction attack for OAI was the Stanford Alpaca research. It was after this that OAI changed its ToS to disallow training on outputs. It can happen to all the model providers.

crfm.stanford.edu/2023/03/13/a...

Stanford CRFM

crfm.stanford.edu

January 29, 2025 at 11:15 PM

moohax.bsky.social

@moohax.bsky.social

People learning what alignment means by asking DeepSeek about Taiwan.

January 29, 2025 at 11:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news