Lightnews — Scholar-powered news

Andrea Palmieri 🤌

@andpalmier.com

25 followers 83 following 11 posts

Threat analyst, eternal newbie / Italian 🍕 in 🇨🇭 / AS Roma 💛❤️

🔗 andpalmier.com

Posts Replies Media Videos

Andrea Palmieri 🤌

@andpalmier.com

🤖 LLMs vs LLMs

It shouldn't really come as a big surprise that some methods for attacking LLMs are using LLMs.

Here are two examples:
- PAIR: an approach using an attacker LLM
- IRIS: inducing an LLM to self-jailbreak

⬇️

November 25, 2024 at 7:08 AM

Andrea Palmieri 🤌

@andpalmier.com

📝 #Prompt rewriting: adding a layer of linguistic complexity!

This class of attacks uses encryption, translation, ascii art and even word puzzles to bypass the LLMs' safety checks.

⬇️

November 25, 2024 at 7:08 AM

Andrea Palmieri 🤌

@andpalmier.com

💉 #Promptinjection: embed malicious instructions in the prompt.

According to #OWASP, prompt injection is the most critical security risk for LLM applications.

They break down this class of attacks in 2 categories: direct and indirect. Here is a summary of indirect attacks:

⬇️

November 25, 2024 at 7:08 AM

Andrea Palmieri 🤌

@andpalmier.com

😈 Role-playing: attackers ask the #LLM to act as a specific persona or as part of a scenario.

A common example is the (in?)famous #DAN (Do Anything Now):

This attacks are probably the most common in the real-word, as they often don't require a lot of sophistication.

⬇️

November 25, 2024 at 7:08 AM

Andrea Palmieri 🤌

@andpalmier.com

We interact (and therefore attack) LLMs mainly using language, therefore let's start from there.

I used this dataset github.com/verazuo/jailbreak_llms of #jailbreak #prompt to create this wordcloud.

I believe it gives a sense of "what works" in these attacks!

⬇️

November 25, 2024 at 7:08 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news