Lightnews — Scholar-powered news

Karl Krauth

@kkt.bsky.social

1.9K followers 650 following 12 posts

Postdoc at Stanford. Previously PhD student at Berkeley AI research. Trying to understand proteins with microfluidics and machine learning.
www.karlk.net

Posts Replies Media Videos

Karl Krauth

@kkt.bsky.social

You could run nvidia-smi in the terminal while your model is running to see if your vram is full.

November 20, 2024 at 1:50 PM

Karl Krauth

@kkt.bsky.social

Ah I wasn't saying that your pcie bandwidth would be abnormally low but rather that you might not be able to fit everything into the rtx4090s vram and so you'd have to make a lot of transfers between cpu ram & vram which is slow and would leave the GPU waiting for data most of the time. :)

November 20, 2024 at 1:22 PM

Karl Krauth

@kkt.bsky.social

This feels like it's just due to the rtx4090 being pcie bandwidth limited for some reason. The peak fp16 compute for an m4 max should be 34 tflops while the rtx is 82 tflops and the vram bandwidth is twice as fast in an rtx.

What happens if you run the model with a smaller resolution img or fp8?

November 20, 2024 at 12:47 PM

Karl Krauth

@kkt.bsky.social

Not restricting it to the fully de novo case, even an example where a model makes a few mutations in a wild-type sequence is fine.
Totally agree that all the work showing some activity in de novo sequences is super impressive.

November 20, 2024 at 10:16 AM

Karl Krauth

@kkt.bsky.social

Intentionally didn't want it to be too high a bar. A single substitution is totally fine as long as the model isn't constrained to mutate a few clearly impactful residues in the active site for example.

November 20, 2024 at 9:58 AM

Karl Krauth

@kkt.bsky.social

I haven't been able to find a paper that:
1. uses ML to propose enzyme sequence
2. measures kcat of designed enzyme and highly similar sequence in the train set
3. shows that the designed enzyme is faster than all other enzymes that catalyze the same reaction in the train set with a known rate

November 20, 2024 at 9:53 AM

Karl Krauth

@kkt.bsky.social

Can't be ruled out 100%, but some designed sequences are going to be more plausibly out-of-distribution than others.

November 20, 2024 at 9:33 AM

Karl Krauth

@kkt.bsky.social

Depends on how big your dataset is. I'm fine with a training set that includes all naturally occuring sequences where most don't have associated kcats for example. I just want to avoid cases where you can just pick a wild-type protein from the same family to get an improved enzyme.

November 19, 2024 at 11:16 PM

Karl Krauth

@kkt.bsky.social

Would love to be added to this. :)

November 19, 2024 at 10:18 AM

Karl Krauth

@kkt.bsky.social

I'd love to be added. I create microfluidic devices and large-scale datasets which I use to train protein language models.

November 19, 2024 at 10:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news