Lightnews — Scholar-powered news

Leon Derczynski

@leonderczynski.bsky.social

690 followers 370 following 170 posts

LLM Security at NVIDIA

Prof in CS/NLP at IT University of Copenhagen

garak guy, garak.ai

"berømt skikkelse"
"like a gazelle"

Copenhagen/Seattle

Posts Replies Media Videos

Pinned

Leon Derczynski @leonderczynski.bsky.social · Dec 2

further recommendations welcome!

go.bsky.app/5PrJYrj

Leon Derczynski

@leonderczynski.bsky.social

😂 arXiv is cute when it pretends to have standards!

November 3, 2025 at 5:53 AM

Leon Derczynski

@leonderczynski.bsky.social

Come to LLMSEC at ACL & hear Niloofar's keynote

"What does it mean for agentic AI to preserve privacy?" - Niloofar Mireshghallah, Meta/CMU

(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)

See you there!

#acl2025 #acl2025nlp

July 28, 2025 at 3:19 PM

Reposted by Leon Derczynski

𝔻𝕖𝕖𝕡 𝕋𝕙𝕠𝕥

@thatsgoodweb.bsky.social

logging on

have the courage to use your own intelligence

July 10, 2025 at 11:35 AM

Leon Derczynski

@leonderczynski.bsky.social

new garak, llm vuln scanner rls (v0.12.0)

* Audio attacks, for multimodal models
* More training data membership inference attacks
* Multilingual attacks can now also use GCP
* Detailed eval summary in one JSONL row/object

+more :)

details: github.com/NVIDIA/garak...

Release v0.12.0 · NVIDIA/garak

What's Changed New plugins Add audio NIM model and audio probes by @erickgalinkin in #1163 Leakreplay refactor by @dchiitmalla in #1264 probes: refactor fact snippet mixin by @leondz in #1187 New...

github.com

July 2, 2025 at 3:32 PM

Leon Derczynski

@leonderczynski.bsky.social

the dying but clinging on battery in the bathroom's Frozen-branded soap dispenser reminds me that it's only 4-5 months til Bublé & Let It Go season. aren't you looking forward

June 27, 2025 at 5:01 AM

Leon Derczynski

@leonderczynski.bsky.social

why do academics send and expect so much weekend email and work. not healthy

June 22, 2025 at 6:10 AM

Leon Derczynski

@leonderczynski.bsky.social

computer scientists encountering the concept of "desirable difficulty"

Bradley Busch @bradleybusch.bsky.social · Jun 18

New research from MIT found that those who used ChatGPT can’t remember any of the content of their essays.

Key takeaway: the product doesn’t suffer, but the process does. And when it comes to essays, the process *is* how they learn.

arxiv.org/pdf/2506.088...

June 19, 2025 at 4:15 PM

Leon Derczynski

@leonderczynski.bsky.social

remembering the time i checked in to my reasonably classy russian business hotel late with my wife, and the staff said "sir, this... girl.. not allowed"

she's a serious professor

we went through to the room, opened the balcony door, and buried a bottle of champagne in the metre of snow

good times

June 18, 2025 at 6:56 AM

Leon Derczynski

@leonderczynski.bsky.social

@jjvincent.bsky.social woah ur really famous! love this attack also. I automate and run it for a living

www.instagram.com/reel/DKz9ezj...

Welcome back to Instagram. Sign in to check out what your friends, family & interests have been capturing & sharing around the world.

www.instagram.com

June 14, 2025 at 6:06 AM

Reposted by Leon Derczynski

Willie Agnew

@willie-agnew.bsky.social

Great to see our work uncovering dangerous issues in commercial LLM "therapists" getting some coverage: futurism.com/stanford-the...

June 14, 2025 at 4:01 AM

Leon Derczynski

@leonderczynski.bsky.social

"natwirkung"

"wirk smorter nat horder"

accents dreamed up by the utterly deranged

(what is going on with that 🇺🇸 vowel sheft)

June 8, 2025 at 10:05 AM

Leon Derczynski

@leonderczynski.bsky.social

i need you to understand that "alternate uses" is a terrible test/definition of creativity and has been for some time. it's extremely narrow, very shallow, and misses almost everything we know about creativity

June 3, 2025 at 5:09 AM

Leon Derczynski

@leonderczynski.bsky.social

3² + 4² = 5² ? big if true

May 21, 2025 at 10:24 AM

Leon Derczynski

@leonderczynski.bsky.social

if overleaf being down slows "ai progress", i'm not sure "ai progress" is particularly well defined

May 15, 2025 at 5:17 AM

Leon Derczynski

@leonderczynski.bsky.social

is a dropped copula a dropula

May 14, 2025 at 8:29 AM

Leon Derczynski

@leonderczynski.bsky.social

Here's my "Most Inappropriate Demo" trophy at NVIDIA, 2024. For garak's "atkgen.Tox" probe, an unfettered LLM used to goad other LLMs into being toxic.

A small plastic cup trophy with stuck-on label

March 19, 2025 at 1:30 PM

Reposted by Leon Derczynski

John Bull

@garius.bsky.social

“If she wants to know something specific, but doesn’t want people to notice her asking questions, she should simply make incorrect statements while in the company of experts. Her companions will correct her, especially if they're men.”

- Advice for female agents in WW2, provided during SOE training

March 17, 2025 at 11:52 AM

Reposted by Leon Derczynski

Mike Ginn

@shutupmikeginn.bsky.social

its amazing how chatgpt knows everything about subjects I know nothing about, but is wrong like 40% of the time in things im an expert on. not going to think about this any further

March 8, 2025 at 12:13 AM

Leon Derczynski

@leonderczynski.bsky.social

was about to dump all my practical knowledge and "I've been thinking about" crap on agent security into a blog post but i do not think the web can take yet another one of those. drank wine instead

February 21, 2025 at 8:22 PM

Reposted by Leon Derczynski

Dr Abeba Birhane

@abeba.bsky.social

they are openly advocating for the use of physiognomy in recruitment

make it stop

Human capital---encompassing cognitive skills and personality traits---is critical for labor market success, yet the personality component remains difficult to measure at scale. Leveraging advances in artificial intelligence and comprehensive LinkedIn microdata, we extract the Big 5 personality traits from facial images of 96,000 MBA graduates, and demonstrate that this novel ``Photo Big 5'' predicts school rank, compensation, job seniority, industry choice, job transitions, and career advancement. Using administrative records from top-tier MBA programs, we find that the Photo Big 5 exhibits only modest correlations with cognitive measures like GPA and standardized test scores, yet offers comparable incremental predictive power for labor outcomes. Unlike traditional survey-based personality measures, the Photo Big 5 is readily accessible and potentially less susceptible to manipulation, making it suitable for wide adoption in academic research and hiring processes. However, its use in labor market screening raises ethical concerns regarding statistical discrimination and individual autonomy.

February 21, 2025 at 5:50 PM

Leon Derczynski

@leonderczynski.bsky.social

things i'm genuinely enjoying rn:

* successfully not reading any news
* getting to do 50h of work in one week (it was enjoyable, usual caveats apply)
* finally a largely healthy family

February 21, 2025 at 6:43 AM

Leon Derczynski

@leonderczynski.bsky.social

it's a weekday where I dont have to take pacific time calls

February 17, 2025 at 5:35 PM

Leon Derczynski

@leonderczynski.bsky.social

my aunt in law has a shetland pony in her freezer for the dogs

February 16, 2025 at 11:43 AM

Leon Derczynski

@leonderczynski.bsky.social

you know the field has changed when the foreign event you were speaking at is on the tv news on the bus home

February 13, 2025 at 9:18 AM

Leon Derczynski

@leonderczynski.bsky.social

Will be representing NVIDIA at the EU AI Summit in Paris. I'll be talking about how we build & help others build safe, secure AI systems.

On 11.2 you can see me at:

* AI Assurance and Testing: Global Perspectives

* Building trustworthy AI: balancing innovation, responsibility, and democratization

February 8, 2025 at 11:35 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news