usmananwar.bsky.social
@usmananwar.bsky.social
This was joint work with amazing co-authors: Spencer Frei, Johannes von Oswald, David Krueger and Louis Kirsch.
Check out the paper on arxiv: arxiv.org/abs/2411.05189
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement c...
arxiv.org
November 11, 2024 at 4:20 PM
To conclude, transformers do not learn robust in-context learning algorithms and we still do not really understand what algorithms GPT-style transformers implement in-context even for a simple setting like linear regression. 🥹
November 11, 2024 at 4:20 PM
Similarly, we find that hijacking attacks transfer poorly btw GPT ↔ OLS – even though ‘in-distribution’ behavior matches quite well btw GPT and OLS! Interestingly, the transfer is considerably worse when going GPT → OLS.. 🤔
November 11, 2024 at 4:20 PM
..Probably not. Our adversarial attacks designed for linear transformers implementing gradient-descent do poorly on (GPT-style) transformers indicating that they are likely not implementing gradient-based ICL algorithms.
November 11, 2024 at 4:20 PM
Finally, are transformers implementing gradient descent or ordinary least squares (OLS) when solving linear regression tasks in context as argued by previous works(arxiv.org/abs/2208.01066, arxiv.org/abs/2211.15661)?
November 11, 2024 at 4:20 PM
We also find that larger transformers are less universal in what in-context learning algorithms they implement – transferability of hijacking attacks gets worse as transformer’s size increases!
November 11, 2024 at 4:20 PM
Can the adversarial robustness of transformers be improved? Yes; we found that gradient-based adversarial training works (even when just fine-tuning), and the tradeoff between clean-performance and adversarial robustness is not significant.
November 11, 2024 at 4:20 PM
We show that linear transformers – which provably implement gradient descent on linear regression tasks – are provably non-robust and can be hijacked by attacking a SINGLE token! Standard GPT-style transformers are similarly non-robust.
November 11, 2024 at 4:20 PM
We specifically study “hijacking attacks” on transformers trained to solve linear regression in-context in which the adversary’s goal is to force the transformer to make an arbitrary prediction by attacking the in-context data.
November 11, 2024 at 4:20 PM
We find
1. Transformers do NOT implement robust ICL algorithms
2. Adversarial training (even at finetuning stage) works!
3. Attacks transfer for small models but not for ‘larger’ transformers.
Arxiv: arxiv.org/abs/2411.05189
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement c...
arxiv.org
November 11, 2024 at 4:20 PM