Lightnews — Scholar-powered news

Amy Lu

@amyxlu.bsky.social

880 followers 200 following 10 posts

CS PhD Student at UC Berkeley & AI for drug discovery at Prescient Design 🇨🇦

Posts Replies Media Videos

Reposted by Amy Lu

Meg T (she/her/hers)

@megthescientist.bsky.social

•introduced “zero shot prediction” as a question of guessing a bioassay’s outcome by likelihoods of pLMs
•commented on biases in evolutionary signals from Tree of life used to train pLMs (a favorite paper I read in 2024: shorturl.at/fbC7g)

December 16, 2024 at 6:29 AM

Amy Lu

@amyxlu.bsky.social

Another straightforward application is generation, either by next-token sampling or MaskGIT style denoising. We made the tokenized version of CHEAP to do generation, and decided to go with diffusion on continuous embeddings instead — but I think either would’ve worked

December 10, 2024 at 1:04 AM

Amy Lu

@amyxlu.bsky.social

immensely grateful for awesome collaborators on this work: Wilson Yan, Sarah Robinson, @kevinkaichuang.bsky.social, Vladimir Gligorijevic, @kyunghyuncho.bsky.social, Rich Bonneau, Pieter Abbeel, @ncfrey.bsky.social 🫶

December 6, 2024 at 5:44 PM

Amy Lu

@amyxlu.bsky.social

6/ We'll get to share PLAID as an oral presentation at MLSB next week 🥳 In the meantime, checkout:

📄Preprint: biorxiv.org/content/10.1...
👩‍💻Code: github.com/amyxlu/plaid
🏋️Weights: huggingface.co/amyxlu/plaid...
🌐Website: amyxlu.github.io/plaid/
🍦Server: coming soon!

biorxiv.org

December 6, 2024 at 5:44 PM

Amy Lu

@amyxlu.bsky.social

5/🚀 ...and when prompted by function, PLAID learns sequence motifs at active sites & directly outputs sidechain positions, which backbone-only methods such as RFDiffusion can't do out-of-the-box.

The residues aren't directly adjacent, suggesting that the model isn't simply memorizing training data:

conditioning on organism and function shows that PLAID has learned active site residues and sidechain positions!

December 6, 2024 at 5:44 PM

Amy Lu

@amyxlu.bsky.social

4/ On unconditional generation, PLAID generates high quality and diverse structures, especially at longer sequence lengths where previous methods underperform...

December 6, 2024 at 5:44 PM

Amy Lu

@amyxlu.bsky.social

3/ I was pretty stuck until building out the CHEAP (bit.ly/cheap-proteins) autoencoders that compressed & smoothed out the latent space: interestingly, gradual noise added to the ESMFold latent space doesn't actually corrupt the sequence and structure until the final forward diffusion timesteps 🤔

noising by a diffusion schedule in the latent space doesn't always correspond to the same corruption in the sequence and structure space...

December 6, 2024 at 5:44 PM

Amy Lu

@amyxlu.bsky.social

2/💡Co-generating sequence and structure is hard. A key insight is that to get embeddings of the ESMFold latent space during training, we only need sequence inputs.

For inference, we can sample latent embeddings & use frozen sequence/structure decoders to get all-atom structure:

December 6, 2024 at 5:44 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news