Gautam
banner
gautammalik.bsky.social
Gautam
@gautammalik.bsky.social
Research Assistant at University of Cambridge | Exploring deep learning in biology with big dreams of using AI to make drug discovery a little less complicated!🧬🖥️
2. Iterative Updates:
The ligand atoms keep moving into formation, adjusting their positions iteratively.

3. Final Coordinates:
After several rounds, the model spits out the final 3D coordinates of the ligand atoms.

And there you have it, Dockformer in action!
November 22, 2024 at 7:46 PM
1. Intra- and Intermolecular Modules:
Two types of attention layers come into play:
a. Intra-ligand attention: Helps the ligand atoms organize themselves correctly.
b. Ligand-protein cross-attention: Helps the ligand atoms adjust based on the protein’s pocket geometry.
November 22, 2024 at 7:46 PM
3. Distance Prediction:
Using all this information, we predict distance matrices:
a. How far should ligand atoms be from each other (intra)?
b. How far should ligand atoms be from protein atoms (inter)?

By now, Dockformer has a detailed understanding of how the ligand fits into the protein pocket.
November 22, 2024 at 7:46 PM
1. Combine Representations:
The embeddings of the ligand and protein from the encoders are concatenated.

2. Binding Blocks:
These transformer layers refine combined representations, addressing:
a. Intra-ligand interactions
b. Ligand-protein interactions
November 22, 2024 at 7:46 PM
3. Multihead Self-Attention with Pair Bias:

Now comes attention mechanism, with a twist: pairwise info as bias. Each atom considers not just others but also their specific connections (e.g., distance, bond type).
November 22, 2024 at 7:46 PM
1. Atom Embedding Initialization:
Each atom’s features (identity + location) are transformed slightly to make them more model-friendly.

2. Pair Embedding Initialization:
Every pair of atoms gets initialized using 2D and 3D information.
November 22, 2024 at 7:46 PM
We use 3D atom coordinates and enhance them with two tricks:
1. Learnable Position Embedding (GPE): Tags each atom’s location using sine/cosine functions.
2. 3D Pair Features: Transforms inter-atomic distances via Gaussian kernels to help the model focus on various distance scales.

Next step!
November 22, 2024 at 7:46 PM
1) 1D Sequence Information:
This is the atom type of each atom in the protein and ligand. (one-hot encoding, obviously!)
2) 2D Graph Information (only for ligand, assuming proteins as rigid):
Here, shortest path distances and edge features are used.
3) 3D Geometric Information:
Next thread👇
November 22, 2024 at 7:46 PM
Let’s walk through training Dockformer with a complex as an example.

Step 1: Getting to Know the Players

First, we introduce our protein and ligand to Dockformer. These are the two main characters in our story. To help the model understand them better, we describe them in three different ways:
November 22, 2024 at 7:46 PM