Our token-level universal transfer attack is somehow stronger than a white-box embedding-level attack!
3️⃣ “Better CoT/reasoning models” like o1 are still far from robust.
(5/7)
Our token-level universal transfer attack is somehow stronger than a white-box embedding-level attack!
3️⃣ “Better CoT/reasoning models” like o1 are still far from robust.
(5/7)
Here are 3 main takeaways:
(2/7)
Here are 3 main takeaways:
(2/7)
⚔️ We propose IRIS, a simple automated 𝘂𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹 𝗮𝗻𝗱 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝗿𝗮𝗯𝗹𝗲 𝗷𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸 𝘀𝘂𝗳𝗳𝗶𝘅 that works on GPTs, o1, and Circuit Breaker defense! To appear at NeurIPS Safe GenAI Workshop!
(1/7)
⚔️ We propose IRIS, a simple automated 𝘂𝗻𝗶𝘃𝗲𝗿𝘀𝗮𝗹 𝗮𝗻𝗱 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝗿𝗮𝗯𝗹𝗲 𝗷𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸 𝘀𝘂𝗳𝗳𝗶𝘅 that works on GPTs, o1, and Circuit Breaker defense! To appear at NeurIPS Safe GenAI Workshop!
(1/7)