📃 Workshop paper: openreview.net/forum?id=eIB... (full paper soon!)
👥 Co-authors: David Huang @davidhuang1.bsky.social, Avi Shah, Alexandre Araujo, David Wagner.
(7/7)
📃 Workshop paper: openreview.net/forum?id=eIB... (full paper soon!)
👥 Co-authors: David Huang @davidhuang1.bsky.social, Avi Shah, Alexandre Araujo, David Wagner.
(7/7)
(6/7)
(6/7)
Our token-level universal transfer attack is somehow stronger than a white-box embedding-level attack!
3️⃣ “Better CoT/reasoning models” like o1 are still far from robust.
(5/7)
Our token-level universal transfer attack is somehow stronger than a white-box embedding-level attack!
3️⃣ “Better CoT/reasoning models” like o1 are still far from robust.
(5/7)
This phenomenon is very unintuitive but confirms that LLM attacks are far from optimal. There's also a clear implication on white-box robustness evaluation.
(4/7)
This phenomenon is very unintuitive but confirms that LLM attacks are far from optimal. There's also a clear implication on white-box robustness evaluation.
(4/7)
Our surprising discovery is some adversarial suffixes (even gibberish ones from vanilla GCG) can jailbreak many different prompts while being optimized on a single prompt.
(3/7)
Our surprising discovery is some adversarial suffixes (even gibberish ones from vanilla GCG) can jailbreak many different prompts while being optimized on a single prompt.
(3/7)
Here are 3 main takeaways:
(2/7)
Here are 3 main takeaways:
(2/7)