Damiano Sgarbossa
banner
damianosg.bsky.social
Damiano Sgarbossa
@damianosg.bsky.social
PhD in Computational Biology & ML for Proteins @EPFL

https://sites.google.com/view/damiano-sgarbossa
Happy to announce that our paper, "ProtMamba: a homology-aware but alignment-free protein state space model", has been published in Bioinformatics! 🎉

doi.org/10.1093/bioi...
July 7, 2025 at 4:48 PM
I'm really happy to share with you that after 4 years at EPFL I'm finally a PhD! 🎉🎓

Last Friday I defended my thesis titled: "Revealing and Exploiting Coevolution through Protein Language Models".

It was an amazing journey where I met some incredible people. Thank you all ❤️
June 30, 2025 at 11:42 AM
RAG-ESM is trained with a discrete diffusion objective, giving it generative capabilities. RAG-ESM achieves SOTA among sequence-based models for conditional generation and motif scaffolding. It outperforms DPLM (650M), EvoDiff-MSA, and ProtMamba on key benchmarks.

6/7
April 11, 2025 at 2:47 PM
An unexpected result: Several cross-attention heads naturally learn to align the input and context sequences, even though the model is trained on unaligned data. This alignment capability emerges purely from the training objective (no explicit alignment supervision).

5/7
April 11, 2025 at 2:47 PM
Using just one homolog as context, RAG-ESM models (12M and 165M params) outperform ESM2 (650M) on masked token prediction. We obtain a 40–50% reduction in perplexity despite using much fewer parameters.

4/7
April 11, 2025 at 2:47 PM
Conditioning on homologs reduces the effective dimensionality of the search space during inference. Instead of encoding information of entire protein families internally, the model can focus its weights on more nuanced biological features.

3/7
April 11, 2025 at 2:47 PM
What does RAG-ESM do?
It augments ESM2 with a few lightweight cross-attention layers that let us condition the model on retrieved homologous sequences. This allows the model to leverage evolutionary information during inference without retraining.

2/7
April 11, 2025 at 2:47 PM
📢 Our new preprint is out on bioRxiv! We introduce RAG-ESM, a retrieval-augmented framework that improves pretrained protein language models like ESM2 by making them homology-aware with minimal additional training costs.
🔗 doi.org/10.1101/2025...
💻 github.com/Bitbol-Lab/r...

1/7
April 11, 2025 at 2:47 PM
We’re happy to announce that our track "AI & the Molecular World" at @appliedmldays.org will take place this year too! Join us in Lausanne on February 13, 2024!

The call for talks is now open! Submit your abstract by January 5, 2024, at:
forms.gle/hu6BEWMN1BcR...
November 22, 2024 at 8:53 AM