Lightnews — Scholar-powered news

Damiano Sgarbossa

@damianosg.bsky.social

750 followers 220 following 19 posts

PhD in Computational Biology & ML for Proteins @EPFL

https://sites.google.com/view/damiano-sgarbossa

Posts Replies Media Videos

Damiano Sgarbossa

@damianosg.bsky.social

📈 Despite its smaller size, ProtMamba is better than SOTA on conditional sequence generation and competitive with other protein language models on fitness prediction, showing the importance of long-context conditioning.

Read it here: doi.org/10.1093/bioi...
Github repo: github.com/Bitbol-Lab/P...

July 7, 2025 at 4:48 PM

Damiano Sgarbossa

@damianosg.bsky.social

🧬 ProtMamba applications include:
- Generating novel protein sequences conditioned on a given set of homologs,
- Inpainting specific regions within sequences,
- Modeling disordered regions of different protein sequences,
- Predicting the fitness of protein variants.

July 7, 2025 at 4:48 PM

Damiano Sgarbossa

@damianosg.bsky.social

⚙️ ProtMamba is based on Mamba, a state space model that efficiently handles very long sequences. The model uses a fill-in-the-middle training objective, combining autoregressive modeling and masked language modeling to predict amino acids conditioned on the given homologs.

July 7, 2025 at 4:48 PM

Damiano Sgarbossa

@damianosg.bsky.social

🔍 ProtMamba is homology-aware yet alignment-free, meaning it captures evolutionary information without relying on multiple sequence alignments. This allows it to avoid the imperfections of MSAs but still use the information of other homologs to condition the generation!

July 7, 2025 at 4:48 PM

Damiano Sgarbossa

@damianosg.bsky.social

Also, a huge thanks to my supervisor Anne-Florence and my defense committee: Bruno Correia @pschwllr.bsky.social @sokrypton.org and Thomas Lemmin

June 30, 2025 at 11:42 AM

Damiano Sgarbossa

@damianosg.bsky.social

This is a work that I did in collaboration with Anne-Florence Bitbol @epfl-ai-center.bsky.social. #CompBio #DeepLearning #ProteinEngineering #AI #MachineLearning #ICLR2025

April 11, 2025 at 2:54 PM

Damiano Sgarbossa

@damianosg.bsky.social

RAG-ESM is simple to implement, compatible with pretrained ESM2 checkpoints, and efficient to train (~50–120 GPU hours).

Come check my poster (spotlight) at the MLGenX workshop at ICLR in Singapore!

Code (still WIP): github.com/Bitbol-Lab/r...
Preprint: doi.org/10.1101/2025...

7/7

GitHub - Bitbol-Lab/rag-esm

Contribute to Bitbol-Lab/rag-esm development by creating an account on GitHub.

github.com

April 11, 2025 at 2:47 PM

Damiano Sgarbossa

@damianosg.bsky.social

RAG-ESM is trained with a discrete diffusion objective, giving it generative capabilities. RAG-ESM achieves SOTA among sequence-based models for conditional generation and motif scaffolding. It outperforms DPLM (650M), EvoDiff-MSA, and ProtMamba on key benchmarks.

6/7

April 11, 2025 at 2:47 PM

Damiano Sgarbossa

@damianosg.bsky.social

An unexpected result: Several cross-attention heads naturally learn to align the input and context sequences, even though the model is trained on unaligned data. This alignment capability emerges purely from the training objective (no explicit alignment supervision).

5/7

April 11, 2025 at 2:47 PM

Damiano Sgarbossa

@damianosg.bsky.social

Using just one homolog as context, RAG-ESM models (12M and 165M params) outperform ESM2 (650M) on masked token prediction. We obtain a 40–50% reduction in perplexity despite using much fewer parameters.

4/7

April 11, 2025 at 2:47 PM

Damiano Sgarbossa

@damianosg.bsky.social

Conditioning on homologs reduces the effective dimensionality of the search space during inference. Instead of encoding information of entire protein families internally, the model can focus its weights on more nuanced biological features.

3/7

April 11, 2025 at 2:47 PM

Damiano Sgarbossa

@damianosg.bsky.social

What does RAG-ESM do?
It augments ESM2 with a few lightweight cross-attention layers that let us condition the model on retrieved homologous sequences. This allows the model to leverage evolutionary information during inference without retraining.

2/7

April 11, 2025 at 2:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news