Matan Ben-Tov
matanbt.bsky.social
Matan Ben-Tov
@matanbt.bsky.social
PhD student in Computer Science @TAU.
Interested in buzzwords like AI and Security and wherever they meet.
What makes or breaks powerful jailbreak suffixes? 🔓🤖

We find that:
🥷 they work by hijacking the model’s context;
♾ the more universal a suffix is the stronger its hijacking;
⚔️🛡️ utilizing these insights, it is possible to both enhance and mitigate these attacks.

🧵
June 18, 2025 at 2:06 PM
How much can we gaslight dense retrieval models? ⛽💡

In our recent work (w/ @mahmoods01.bsky.social) we thoroughly explore the susceptibility of widely-used models for dense embedding-based text retrieval to search-optimization attacks via corpus poisoning.

🧵 (1/16)
January 8, 2025 at 7:57 AM