Ben Brown
bpbrown.bsky.social
Ben Brown
@bpbrown.bsky.social
Assistant Professor at Vanderbilt University. MD/PhD who doesn't practice medicine. I am interested in biomolecular motion and drug design. Computational biology, multiscale modeling, cheminformatics, opioid receptors, EGFR kinase
Good question. Yeah, I had reviews from two folks and also feedback from the editor. I do not know the rules or customs around anonymity of reviewers at PNAS. Most journals from which I receive feedback do not disclose reviewer identities. There is some discussion around this practice broadly.
October 18, 2025 at 1:26 AM
Most of the posts I see on peer review during the publication process are overwhelmingly negative. I just wanted to highlight that in this case I received helpful feedback. I have preprinted before and will likely do so again in the future.
October 17, 2025 at 2:11 PM
Big thanks for all the support from the new Vanderbilt Center for AI in Protein Dynamics, our CSB @vanderbiltcsb.bsky.social , and my department.
October 17, 2025 at 1:44 PM
I started my lab in April 2024, and I think we are starting to build some momentum. Hopefully in the next few months I will be able to share some other stuff we are working on.
October 17, 2025 at 1:44 PM
Finally, I want to note that the peer review process improved this manuscript. While I understand the benefits of pre-prints and the limitations of peer review, this was an instance where it was genuinely constructive and elevated the final paper.
October 17, 2025 at 1:44 PM
Also, apologies for being slow with it, but I'll add more scripts, examples, a better UI, etc. to the GitHub soon. I had to freeze a lot of the project a while back for purposes of benchmarks and picking a stopping point for the manuscript.
October 17, 2025 at 1:44 PM
The new CORDIAL model will also be trained on substantially more synthetic null data covering broader chemical and structural perturbations. We will upload the weights for these new models, too.
October 17, 2025 at 1:44 PM
Co-folding models are likely overtrained on pairs of sequences and chemical substructures, but for generating plausible structures for affinity prediction with CORDIAL, they probably represent a better version of what I tried to do with the MCS maps and refinement.
October 17, 2025 at 1:44 PM
Speaking of improvements, we're extending the training set with the new SAIR dataset from @sandboxaq.bsky.social . Our original augmentation mimicked known poses via MCS mapping.
October 17, 2025 at 1:44 PM
Anyway, CORDIAL generalized pretty well, but perhaps not unexpectedly it did not yield dramatic performance improvements over Vina. There are clear ways to increase CORDIAL's expressivity - learning atom-pair embeddings/weights, incorporating additional geometric information, etc.
October 17, 2025 at 1:44 PM
This first pass at the CATH-LSO benchmark was useful, but in subsequent iterations I'll be tweaking it to make it more challenging. I hope this work encourages more dialogue on best practices for retrospective validation. I'm open to better strategies if folks have suggestions.
October 17, 2025 at 1:44 PM
So, how do we evaluate generalizability? I tried to set it up to mimic screening against a member of a novel, unseen protein superfamily. I hold out a protein superfamily and its associated chemistry, train on the remainder, and test on the held-out set. Thanks CATH team.
October 17, 2025 at 1:44 PM
This doesn't completely eliminate bias, but it reduces it and makes it more predictable. For example, signal magnitudes in feature columns can differ between train/test sets. Consequently, BatchNorm1d or something similar is required to prevent the model from over-training on these patterns.
October 17, 2025 at 1:44 PM
The approach here was to use a task-specific architecture. Instead of guiding the model to focus on interactions, we restrict its learning space to them. The model is constrained to view the problem only through distance-dependent physicochemical pairings.
October 17, 2025 at 1:44 PM
The challenge is that the model needs a massive amount of data to guide it to learning the problem how we want. With a broad inductive bias, a model can easily learn non-causal correlations from training set artifacts instead of the generalizable principles we intend.
October 17, 2025 at 1:44 PM
Often we have an idea of what we want the model to learn, and it is easy to assume that the network will tend to learn the problem the way that we consider it.
October 17, 2025 at 1:44 PM
This manuscript is an exploration of learning spaces. In my lab, we think a lot about the spaces of things. A model's architecture defines the manifold on which learning occurs.
October 17, 2025 at 1:44 PM
To be clear, I am not selling a model, I do not believe I have solved this problem, and I am not suggesting you should scrap your existing tools and just use this. The paper introduces a model, CORDIAL, but it's not really about the model itself. So, what is this manuscript about?
October 17, 2025 at 1:44 PM