Lightnews — Scholar-powered news

Michael Jendrusch

@mjendrusch.bsky.social

810 followers 310 following 33 posts

(he/him) Former PhD student / Postdoc @ Korbel group, EMBL.
Protein ML person, mathematics & science enthusiast.

developer of salad
preprint: https://www.biorxiv.org/content/10.1101/2025.01.31.635780v1
code: https://github.com/mjendrusch/salad

Posts Replies Media Videos

Michael Jendrusch

@mjendrusch.bsky.social

Also, shoutout to @blender.org and @bradyajohnston.bsky.social for making it so easy to turn protein structures into nice images!
5/5

Protein structures colored teal and purple floating above a reflective surface on a dark background. The protein structures spell out the word "SALAD" in all-capital letters.

September 24, 2025 at 12:27 PM

Michael Jendrusch

@mjendrusch.bsky.social

Compared to the preprint, we have added comparisons to Proteína (openreview.net/forum?id=TVQLu34…) in terms of unconditional design. Here, salad compares favorably in terms of both runtime per design and designability of generated structures.
3/🧵

Graph of runtime comparisons of diffusion models (per design: top; per iteration: botton). This panel compares the runtimes of previous protein diffusion models (RFdiffusion, Genie, Proteína, Chroma) to two versions of the salad model. Salad outperforms all previous models in terms of runtime across protein lengths between 50 and 1,000 amino acid residues.

Boxplot of self-consistent RMSDs (scRMSDs) for salad models with different noise schedules (variance preserving, variance expanding, shaped) and the previous state of the art (RSO, Proteína), for proteins of length between 50 and 1,000 residues.
Variance expanding and shaped noise salad models consistently exhibit lower (better) scRMSDs compared to previous approaches across all sizes. While Proteína produces comparable results to salad all the way up to 800 residues, it surprisingly fails for 1,000 residue proteins.

September 24, 2025 at 12:27 PM

Michael Jendrusch

@mjendrusch.bsky.social

Some salad news: The code (github.com/mjendrusch/salad) is now somewhat cleaned up and there is a Colab notebook for running a full salad -> ProteinMPNN -> AlphaFold2 pipeline to design proteins like those in the image:
colab.research.google.com/github/mjend...

More improvements coming soon!

AlphaFold predicted structures of 5 protein heterodimers designed with the salad software. Chain A is colored purple and chain B is colored cyan.

May 21, 2025 at 3:40 PM

Michael Jendrusch

@mjendrusch.bsky.social

This tendency is also the case for failed designs (1,000 aa, failed, below). We did not include these in the data package, as the archive including everything was becoming uncomfortably large.

failed salad domain 80 A structures at 1,000 amino acids

February 7, 2025 at 1:13 PM

Michael Jendrusch

@mjendrusch.bsky.social

You're right, denoising models seem to generate more idealised / boring structures, especially for larger proteins.
For smaller proteins, salad generations can get pretty crazy, though (400aa, upper row is the first 4 salad VP-scaled examples, lower row is the first 4 RSO examples I could find)

top row: 400 amino acid successful VP-salad structures (not cherry-picked, the last four structures in the data package sorted by timestamp)
bottom row: 400 amino acid successful RSO structures (not cherry-picked, the last four structures in the RSO data package for that size, sorted by timestamp)

February 7, 2025 at 1:13 PM

Michael Jendrusch

@mjendrusch.bsky.social

It also means that we can have salad spell out its name in proteins.

In the image you can see salad-generated structures spelling the word "SALAD", overlaid with their ESMfold predictions and scRMSD.

(6/N)

Salad-generated structures (coloured) in the shape of letters spelling out "salad" and their ESMfold predictions (gray). Predictions match the designs well with scRMSD below 2 angstroms.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

... such as motif-scaffolding, designing screw-symmetric repeat proteins and multi-state protein design à la ProteinGenerator (cf. www.nature.com/articles/s41...).

(5/N)

Example structures of designed scaffolds for three different protein motifs, using both a version of salad trained for motif-scaffolding and the regular version of salad, without motif-scaffolding training. The motifs are highlighted in gray.

structures of screw-symmetric repeat proteins with different rotation angles generated using salad (in grey), overlaid with their structure predictions using ESMfold and AlphaFold 3.

Example structures of designed multi-state proteins (gray) and their predictions by AlphaFold 2 (overlaid, coloured). Designs are predicted to adopt different folds when split into two halves.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

4. we can teach salad new tricks without having to re-train the model.

We extend RFdiffusion's sampling trick for symmetry to arbitrary edits to the input and output of salad models at each generation step.

This way, we can make salad work on design tasks it was not trained for ...

(4/N)

Schematic of salad's generative process: instead of the regular diffusion model generative process, we can edit the input noise and condition to the model, as well as its output denoised structure at each step. This allows us to make the model produce structures for tasks it was not trained on, such as motif-scaffolding or multi-state protein design.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

3. salad models generate designable structures up to 1,000 amino acids long.

For proteins of this size one would otherwise have to use RSO or af2cycler so far – protein structure diffusion models do not work well.

Our best salad models (blue) bridge this designability gap to RSO.

(3/N)

Box plot of scRMSD (self-consistent RMSD) between generated structures and their corresponding ESMfold predictions following ProteinMPNN sequence design. Lower is better. For RSO (relaxed sequence space optimization), the median scRMSD stays below 2 angstroms for proteins up to length 800 amino acids. Our best models for large proteins show consistently lower median scRMSD than RSO.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

Why yet another protein structure generative model if we already have RFdiffusion?
(and Genie 2 and Chroma and Proteus and many others)

1. salad is tiny, with the denoising model clocking in at about 8.4 M parameters.
2. salad is fast, up to 42x faster than RFdiffusion on large proteins

(2/N)

A pair of plots comparing the runtime of our models (salad and salad minimal) to previous protein structure diffusion models (Chroma, Genie 2 and RFdiffusion) both on a per-design basis, with the default number of diffusion steps and on a per-step basis. On the same GPU (RTX 3090) our models are consistently faster than the competition.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

New protein ML preprint from my PhD project.
We describe salad (sparse all-atom denoising), a family of efficient protein structure diffusion models and show that it works well on a bunch of protein design task previously described in the literature.

Preprint: www.biorxiv.org/content/10.1...

(1/N)

Five generated protein structures coloured with a gradient from purple to teal. The structures are shaped and positioned to spell out the word "salad" – the name of the software described in the mentioned preprint – in all-capital letters.

February 6, 2025 at 1:07 PM

Michael Jendrusch

@mjendrusch.bsky.social

Merry Christmas! May your protein backbones be designable, diverse and novel.
(now with 100% more structure predictions and plausibility metrics)

Ribbon-style structures of generated proteins spelling out the message "Merry Christmas" in all-capital letters. Generated structures are coloured in grey and overlaid with ESMfold structure predictions coloured by residue index (purple C-terminus to turquoise N-terminus).

Number of amino acids, self-consistent RMSDs (scRMSDs) between generated structure and prediction, as well as pLDDTs are listed for each structure.

December 25, 2024 at 9:22 PM

Michael Jendrusch

@mjendrusch.bsky.social

Reason Nr. 197 why working in protein design is fun:
you can spell out random messages as protein structures.

Ribbon-style structures of proteins generated to spell out the words "Merry Christmas" in all-capital letters on a black background.

December 24, 2024 at 3:02 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news