biorxiv.org/content/10.1...
Code, Models, and Data (full release very soon):
github.com/jozhang97/am...
biorxiv.org/content/10.1...
Code, Models, and Data (full release very soon):
github.com/jozhang97/am...
AF corruption is not structured, it is not explicitly modeled, and it varies across protein size and topology. Yet, our framework still handles it.
AF corruption is not structured, it is not explicitly modeled, and it varies across protein size and topology. Yet, our framework still handles it.
We achieve novelty improvements, showing more unique structure generation.
This is achieved by using more datapoints, as low pLDDT AF structures are not filtered out as done previously.
We achieve novelty improvements, showing more unique structure generation.
This is achieved by using more datapoints, as low pLDDT AF structures are not filtered out as done previously.
Our framework further boosts performance, leading to the best model for short and long protein generation.
Handling noise properly matters more than architecture.
Our framework further boosts performance, leading to the best model for short and long protein generation.
Handling noise properly matters more than architecture.
This redundancy causes an overrepresentation of common motifs.
We fix it by tuning FoldSeek to explicitly focus on structural topology.
This redundancy causes an overrepresentation of common motifs.
We fix it by tuning FoldSeek to explicitly focus on structural topology.
Ambient Protein Diffusion substantially outperforms previous baselines in short and long protein generation.
For short proteins, we dominate the Pareto frontier between designability and diversity, using a ~13x smaller model than previous SOTA.
Ambient Protein Diffusion substantially outperforms previous baselines in short and long protein generation.
For short proteins, we dominate the Pareto frontier between designability and diversity, using a ~13x smaller model than previous SOTA.
Instead of filtering them out (as done in prior work), we use them for a subset of the diffusion times.
Enough noise "erases" the AF mistakes, and we can still learn from those structures.
Instead of filtering them out (as done in prior work), we use them for a subset of the diffusion times.
Enough noise "erases" the AF mistakes, and we can still learn from those structures.
SOTA protein structure models are trained on AFDB (214M AlphaFold predicted structures) subsets.
AF accuracy drops with increasing protein length and complexity, making it hard to generate such proteins.
SOTA protein structure models are trained on AFDB (214M AlphaFold predicted structures) subsets.
AF accuracy drops with increasing protein length and complexity, making it hard to generate such proteins.
Diversity improves by 91% and designability by 26% over the previous 200M SOTA model for long proteins.
The trick? Treat low pLDDT AlphaFold predictions as low-quality data.
Diversity improves by 91% and designability by 26% over the previous 200M SOTA model for long proteins.
The trick? Treat low pLDDT AlphaFold predictions as low-quality data.