Sam Blau
banner
samblau.bsky.social
Sam Blau
@samblau.bsky.social
Research scientist & computational chemist at Berkeley Lab using HT DFT workflows, machine learning, and reaction networks to model complex reactivity.
We can't wait to see what the community does with OMol! Don't hesitate to reach out with feedback on the data, models, or paper - we aren't going to submit to a journal until the leaderboard goes up, which means we have time to incorporate community feedback (within reason) 10/10
May 14, 2025 at 8:57 PM
A special shout out to co-first authors Daniel Levine and Muhammed Shuaibi who moved mountains making OMol a reality. I also want to recognize the substantial and critical contributions of @ewcspottesmith.bsky.social, Michael Taylor, Muhammad Hasyim, and Kyle Michel 9/N
May 14, 2025 at 8:57 PM
Co-leading OMol with Brandon and Larry was a joy and an honor - as was assembling a world-leading team of scientists from 2 companies, 2 national labs, and 6 universities who were excited to help build an open-source, revolutionary molecular DFT dataset to push science forward 8/N
May 14, 2025 at 8:57 PM
Right now, OMol data has energy, forces, partial charges, partial spins, and HOMO/LUMO. But we have far more info that we still need to parse and hope to do a battery of GBW postprocessing. Plus we have 10 petabytes of electron densities. Lots more to come! 7/N
May 14, 2025 at 8:57 PM
And check out the UMA demo (facebook-fairchem-uma-demo.hf.space UMA is trained on OMol + other FAIR Chemistry datasets) - metal complexes at +1 vs +2 correctly optimize to tetrahedral/planar and reduced ethylene carbonate correctly ring-opens while a neutral EC remains stable 6/N
Gradio
facebook-fairchem-uma-demo.hf.space
May 14, 2025 at 8:56 PM
We're also releasing baseline models trained on OMol. To guide future MLIP development, we built novel evaluations on intermolecular interactions, conformers, and charge/spin. We hope to include frequency, ΔG, and TSopt tasks when we put up a public leaderboard in the summer 4/N
May 14, 2025 at 8:55 PM
OMol was constructed via an unprecedented diversity of methods: MD, ML-MD, RPMD, rattling, Architector, rxn path interpolation, AFIR, optimization, and scaled separation. We also recalculated some previous datasets and did additional sampling/structure generation atop others 3/N
May 14, 2025 at 8:54 PM
OMol covers 83 elements, a wide range of intra and intermolecular interactions, explicit solvation, reactive structures, conformers, charges -10 to 10, 0-10 unpaired electrons, and 2-350 atoms per snapshot. It required >6B CPU hrs, 10x more than any prev MLIP training dataset 2/N
May 14, 2025 at 8:53 PM