Lightnews — Scholar-powered news

Leon Derczynski

@leonderczynski.bsky.social

690 followers 370 following 170 posts

LLM Security at NVIDIA

Prof in CS/NLP at IT University of Copenhagen

garak guy, garak.ai

"berømt skikkelse"
"like a gazelle"

Copenhagen/Seattle

Posts Replies Media Videos

Leon Derczynski

@leonderczynski.bsky.social

😂 arXiv is cute when it pretends to have standards!

November 3, 2025 at 5:53 AM

Leon Derczynski

@leonderczynski.bsky.social

Come to LLMSEC at ACL & hear Niloofar's keynote

"What does it mean for agentic AI to preserve privacy?" - Niloofar Mireshghallah, Meta/CMU

(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)

See you there!

#acl2025 #acl2025nlp

July 28, 2025 at 3:19 PM

Leon Derczynski

@leonderczynski.bsky.social

3² + 4² = 5² ? big if true

May 21, 2025 at 10:24 AM

Leon Derczynski

@leonderczynski.bsky.social

Here's my "Most Inappropriate Demo" trophy at NVIDIA, 2024. For garak's "atkgen.Tox" probe, an unfettered LLM used to goad other LLMs into being toxic.

A small plastic cup trophy with stuck-on label

March 19, 2025 at 1:30 PM

Leon Derczynski

@leonderczynski.bsky.social

it's a weekday where I dont have to take pacific time calls

February 17, 2025 at 5:35 PM

Leon Derczynski

@leonderczynski.bsky.social

you know the field has changed when the foreign event you were speaking at is on the tv news on the bus home

February 13, 2025 at 9:18 AM

Leon Derczynski

@leonderczynski.bsky.social

Will be representing NVIDIA at the EU AI Summit in Paris. I'll be talking about how we build & help others build safe, secure AI systems.

On 11.2 you can see me at:

* AI Assurance and Testing: Global Perspectives

* Building trustworthy AI: balancing innovation, responsibility, and democratization

February 8, 2025 at 11:35 AM

Leon Derczynski

@leonderczynski.bsky.social

Should've seen it coming

February 2, 2025 at 6:41 AM

Leon Derczynski

@leonderczynski.bsky.social

it was too difficult to not buy

January 13, 2025 at 5:36 AM

Leon Derczynski

@leonderczynski.bsky.social

Are there people who don't make the sponge cake rice cooker recipe asap?!

January 5, 2025 at 5:07 PM

Leon Derczynski

@leonderczynski.bsky.social

Good Christmas times, finally the elephant has come to our house!

December 28, 2024 at 12:58 PM

Leon Derczynski

@leonderczynski.bsky.social

Sokath, his eyes uncovered

December 23, 2024 at 12:49 PM

Leon Derczynski

@leonderczynski.bsky.social

wdym. how could anyone get that impression

December 4, 2024 at 5:44 PM

Leon Derczynski

@leonderczynski.bsky.social

see you at #chr2024! (i swore off hashtags but i guess sometimes it's hard to stay on the wagon)

December 4, 2024 at 8:44 AM

Leon Derczynski

@leonderczynski.bsky.social

Did not expect Russian Ryanair to get this excited about Black Friday

November 29, 2024 at 8:14 AM

Leon Derczynski

@leonderczynski.bsky.social

i suppose the fact that my reaction to this at 7.something a.m. is one of mild panic and dejection is a clue that i should not work in product or sales

November 27, 2024 at 6:49 AM

Leon Derczynski

@leonderczynski.bsky.social

and the tests work in practice. ANSI terminal control sequence attack success rates on <a recent model> - way into the double digits!

November 25, 2024 at 3:49 PM

Leon Derczynski

@leonderczynski.bsky.social

it turns out LLMs can output the control codes needed for ANSI control codes to run - so the entirety of those command sets is available usable through LLM output. why are these things in the tokeniser lol

November 25, 2024 at 3:49 PM

Leon Derczynski

@leonderczynski.bsky.social

STÖK did a cool talk (www.youtube.com/watch?v=3T2A...) showing how viewing e.g. a log entry can cause code to be run on your computer without your intervention, great for hacking. all of OSC8 + OSC52 is available without user intervention. but wait, there's more!

November 25, 2024 at 3:49 PM

Leon Derczynski

@leonderczynski.bsky.social

just found a hilarious llm vuln. you know those ANSI escape codes for colouring your terminal? they can also move the cursor, hide text, execute commands, etc. through just viewing a piece of text

November 25, 2024 at 3:49 PM

Leon Derczynski

@leonderczynski.bsky.social

For example

November 24, 2024 at 6:19 AM

Leon Derczynski

@leonderczynski.bsky.social

> trying to persuade my partner to get onto bsky after they already left twitter
> they can't find a pfp they want
> "just take a selfie now"
> "in the bathroom? no. never. bathroom selfies are utter trash. the worst"
> "lol 🫠"

November 23, 2024 at 3:04 PM

Leon Derczynski

@leonderczynski.bsky.social

Here's the French minister of AI, the convening Columbia prof, and the president of Mozilla, standing on an office table in a bar on a swaying boat

November 21, 2024 at 1:27 PM

Leon Derczynski

@leonderczynski.bsky.social

Back in snowy Denmark. I always forget how good the showers tend to be here - in heat, control, water quantity, and bathroom layout.

November 21, 2024 at 9:55 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news