Enhancing Permeability Prediction of Heterobifunctional Degraders Using Machine Learning and Metadynamics-Informed 3D Molecular Descriptors
Heterobifunctional degraders, a class of targeted protein degraders (TPDs), often occupy beyond-rule-of-five (bRo5) chemical space, where traditional passive permeability models─calibrated on drug-like molecules or peptides and based on topological descriptors─have limited applicability. This study explores the predictive value of three-dimensional (3D) descriptors for modeling passive membrane permeability of heterobifunctional degraders in bRo5 chemical space. These include intramolecular hydrogen bonds (IMHBs), radius of gyration (Rgyr), and 3D polar surface area (3D-PSA) derived from conformational ensembles generated from well-tempered metadynamics (WT-MetaD) in explicit chloroform. These 3D ensembles were further refined and Boltzmann-weighted using ANI-2x neural network potentials to better represent molecular flexibility and solvent-relevant low-energy conformers. Three machine learning regression models─random forest regression (RF), partial least-squares (PLS), and linear support vector machines (LSVM)─were trained and evaluated using two-dimensional (2D), 3D, and combined 2D + 3D molecular descriptor sets. The inclusion of 3D descriptors consistently improved predictive performance across all models, notably in the challenging bRo5 chemical space. In the best-performing case (PLS), cross-validated r2 improved from 0.29 to 0.48 with the addition of 3D features. This highlights the added predictive power of 3D features for modeling passive permeability in the challenging bRo5 chemical space, where 2D descriptors alone underperform. Feature importance analysis identified the 3D descriptor Rgyr as the dominant contributor to passive permeability, with additional contributions from 3D-PSA and intramolecular hydrogen bonds (IMHBs). Specifically, on the same held-out split, the linear relationship between permeability and Rgyr appears stronger than the averaged model results (means over 100 randomized 50/50 splits), underscoring the role of Rgyr as a dominant predictor in this data set while highlighting the consistent gains achieved by incorporating 3D information beyond 2D descriptors. Together, these 3D features reflect molecular compactness, spatial polarity, and internal hydrogen bonding─key determinants of passive permeability. These results highlight the utility of physically meaningful, ensemble-derived 3D descriptors for improving permeability prediction and guiding the rational design of permeable compounds in bRo5 chemical space. The Amber-based molecular dynamics workflow developed in this study is broadly applicable to heterobifunctional degraders, facilitating the evaluation of permeability properties across diverse protein targets and E3 ligases.