Teresa Head-Gordon Lab
banner
thglab.bsky.social
Teresa Head-Gordon Lab
@thglab.bsky.social
Student-run THG Lab account @UC Berkeley. We develop physics-based and machine learning-based models for various systems.
Huge shoutout to the amazing team behind this:
👏 Lead author Yingze (Eric) Wang and @kunyangsun.bsky.social
👏 PI Prof. Teresa Head-Gordon
👏 Teammates Jie Li, Xingyi Guan, Oufan Zhang, Dorian Bagni
👏 Collaborators Dr. Heather A. Carlson and Prof. Yang Zhang
GitHub - THGLab/HiQBind: Workflow to clean up and fix structural problems in protein-ligand binding datasets
Workflow to clean up and fix structural problems in protein-ligand binding datasets - THGLab/HiQBind
github.com
April 7, 2025 at 11:49 PM
What’s next?

We’re exploring:
🔁 Rotamer refinement
🤖 Binding label extraction with LLMs (maybe 👀)
🧠 Better data splits (possibly inspired by PLINDER) to support ML research!
GitHub - THGLab/HiQBind: Workflow to clean up and fix structural problems in protein-ligand binding datasets
Workflow to clean up and fix structural problems in protein-ligand binding datasets - THGLab/HiQBind
github.com
April 7, 2025 at 11:49 PM
Since we're focused on structural data with binding labels, we applied this workflow to major open-access datasets (BioLiP, BindingDB, and BindingMOAD) to generate HiQBind: a cleaned, corrected dataset comparable in size to PDBBind v2020 but with significantly improved structural quality! 💥
GitHub - THGLab/HiQBind: Workflow to clean up and fix structural problems in protein-ligand binding datasets
Workflow to clean up and fix structural problems in protein-ligand binding datasets - THGLab/HiQBind
github.com
April 7, 2025 at 11:49 PM
In this work, we built HiQBind-workflow, a semi-automated workflow that processes protein–ligand structures from the RCSB PDB by adding missing atoms, correcting ligand geometries, fixing bond orders and protonation states, and much more!
GitHub - THGLab/HiQBind: Workflow to clean up and fix structural problems in protein-ligand binding datasets
Workflow to clean up and fix structural problems in protein-ligand binding datasets - THGLab/HiQBind
github.com
April 7, 2025 at 11:49 PM