Professor Philip Torr, Oxford University
philiptorr.bsky.social
Professor Philip Torr, Oxford University
@philiptorr.bsky.social
Professor Oxford in Machine Learning
Involved in many start ups including FiveAI, Onfido, Oxsight, AIStetic. Eigent, etc
I occasionally look here but am mostly on linkedin, find me there, www.linkedin.com/in/philip-torr-1085702
this looks great but i have the strange urge to share it before i've read to the end
September 17, 2025 at 9:52 AM
Reposted by Professor Philip Torr, Oxford University
Co-authored with: Isaac Friend, Keir Reid, Igor Krawczuk, Vincent Wang, @jakobmokander.bsky.social kander.bsky.social , @philiptorr.bsky.social , Julia C Morse and Robert Trager
Kander (@kander.bsky.social)
Malware malbec and gummie bears!
kander.bsky.social
June 27, 2025 at 8:07 AM
Reposted by Professor Philip Torr, Oxford University
Read more: cemde.github.io/Domain-Certi...

Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.

(7/7)
Shh, don't say that! Domain Certification in LLMs
Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.
cemde.github.io
April 4, 2025 at 8:12 PM
Reposted by Professor Philip Torr, Oxford University
📄 Dive In:

paperhttps://arxiv.org/abs/2502.19964.

Work led by
@lovisheindrich.bsky.social
- [Lovis is in the job market, you should hire him, he is great!]

and thanks to
@philiptorr.bsky.social and @vthost.bsky.social
for advising.
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Sparse autoencoders (SAEs) have emerged as a promising approach in language model interpretability, offering unsupervised extraction of sparse features. For interpretability methods to succeed, they m...
arxiv.org
March 1, 2025 at 6:14 PM
Reposted by Professor Philip Torr, Oxford University
🏛️ This work was made possible with OATML and TVG at the University of Oxford (@ox.ac.uk). Special thanks to @yaringal.bsky.social, @adelbibi.bsky.social, @philiptorr.bsky.social, and @alasdair-p.bsky.social for their contributions.

📖 Read the paper: www.arxiv.org/abs/2503.10809
Attacking Multimodal OS Agents with Malicious Image Patches
Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS. These multimodal OS agents autonomously perform computer-...
www.arxiv.org
March 18, 2025 at 6:25 PM