🔗 andpalmier.com
It shouldn't really come as a big surprise that some methods for attacking LLMs are using LLMs.
Here are two examples:
- PAIR: an approach using an attacker LLM
- IRIS: inducing an LLM to self-jailbreak
⬇️
It shouldn't really come as a big surprise that some methods for attacking LLMs are using LLMs.
Here are two examples:
- PAIR: an approach using an attacker LLM
- IRIS: inducing an LLM to self-jailbreak
⬇️
This class of attacks uses encryption, translation, ascii art and even word puzzles to bypass the LLMs' safety checks.
⬇️
This class of attacks uses encryption, translation, ascii art and even word puzzles to bypass the LLMs' safety checks.
⬇️
According to #OWASP, prompt injection is the most critical security risk for LLM applications.
They break down this class of attacks in 2 categories: direct and indirect. Here is a summary of indirect attacks:
⬇️
According to #OWASP, prompt injection is the most critical security risk for LLM applications.
They break down this class of attacks in 2 categories: direct and indirect. Here is a summary of indirect attacks:
⬇️
I used this dataset github.com/verazuo/jailbreak_llms of #jailbreak #prompt to create this wordcloud.
I believe it gives a sense of "what works" in these attacks!
⬇️
I used this dataset github.com/verazuo/jailbreak_llms of #jailbreak #prompt to create this wordcloud.
I believe it gives a sense of "what works" in these attacks!
⬇️