Mihail Bogojeski
bogojeski.bsky.social
Mihail Bogojeski
@bogojeski.bsky.social
Machine learning researcher @ D. E. Shaw Research working on ML for quantum chemistry and drug discovery. Big fan of TTRPGs and bouldering.
Really happy to announce that I started working at D. E. Shaw Research this week. It's been an exciting first week at the company and I can't wait to start contributing to the very cool and ambitious research being done here!
February 14, 2025 at 11:10 PM
Reposted by Mihail Bogojeski
Just had an idea
December 10, 2024 at 9:44 AM
Reposted by Mihail Bogojeski
What are the most interesting datasets and benchmark-related work for ML in drug discovery at NeurIPS?

We’ll be at the conference doing short interviews with researchers and handing out some Polaris merch!

Here’s who we have on the shortlist. 🧵
December 9, 2024 at 5:09 PM
Proud to present my first paper as a supervisor for the brilliant PhD student Parastoo Semnani, introducing a conceptual ML and XAI framework for catalyst yield prediction.
pubs.acs.org/doi/10.1021/...
A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery
The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.
pubs.acs.org
December 6, 2024 at 9:22 PM
Reposted by Mihail Bogojeski
This is insane: cluster representations of molecules by the *distance between their gzip-compressed strings* and you get better results than modern GNNs. 🤯
pubs.rsc.org/en/content/a...
Learning on compressed molecular representations
Last year, a preprint gained notoriety, proposing that a k-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (...
pubs.rsc.org
November 22, 2024 at 7:25 AM
Reposted by Mihail Bogojeski
For no reason at all, felt like sharing an old banger: Apolitical RPGs Do Not Exist
#osr #ttrpg
Apolitical RPGs Do Not Exist — Prismatic Wasteland
An brief exploration of why games are political by their very nature.
www.prismaticwasteland.com
November 22, 2024 at 1:22 PM
Reposted by Mihail Bogojeski
Today’s #JournalClub from @chemwedding.bsky.social discusses (questions?) the value of #equivalence in #neural_network architectures, feat. Works by @mmbronstein.bsky.social, Tess Smidt, Michele Ceriotti, and others. www.grynova-ccc.org/journal-club... #chemsky #ML
November 21, 2024 at 1:32 PM