Artur Szałata
chatgtp.bsky.social
Artur Szałata
@chatgtp.bsky.social
Machine learning for molecular biology. ELLIS PhD student at Fabian Theis lab. EPFL alumnus.
Excited to share that I started my summer at @genentech.bsky.social BRAID Perturbation team in SF with Alex Wu!

It's my first time on the West Coast - If you are around and would like to talk about ML and/or biology, hit me up!

Looking fwd to the AI x Bio Unconference tomorrow 🚀
June 18, 2025 at 4:37 AM
Thanks to a great co-lead Andrew Benz, supervisors Daniel Burkhardt, Malte Luecken, @fabiantheis.bsky.social, help with OP from Robrecht Cannoodt, and everyone involved!
@chanzuckerberg.bsky.social and Cellarity for funding to generate data, Kaggle for competition, and SaturnCloud for compute. 🧵8/8
November 15, 2024 at 10:37 PM
We implemented the winning Kaggle competition methods in our Open Problems Perturbation Prediction (OP3) platform. It has a robust eval with baseline methods, and dataset bootstrapping. Simple NNs (with a few caveats) perform best. Also, drugs with larger effects are more difficult to predict. 🧵6/8
November 15, 2024 at 10:37 PM
Single-cell perturbation readouts have batch effects and a low signal-to-noise ratio. DEG analysis with GLMs and replicates help, but we need to decide on perturbation effect representation - we developed a “cross-donor retrieval” metric for perturbation effect representation evaluation. 🧵4/8
November 15, 2024 at 10:37 PM
We generated a single-cell dataset of 146 drug perturbations in PBMCs of 3 human donors. We used it to benchmark perturbation effect predictions for held-out (cell type, compound) pairs. Perturbation effects are derived from DEG - contrasts treatment vs control in a generalized linear model. 🧵3/8
November 15, 2024 at 10:37 PM
The chemical and biological space of possible perturbations is very large. Thus, methods try to learn from a fraction of possible experiments and infer the rest. However, existing perturbation datasets are limited by size and data quality issues. 🧵2/8
November 15, 2024 at 10:37 PM
Unlike popular in the field autoencoders, transformers take as input a set or a variable-length sequence of embeddings. Transformers rely on attention mechanism and can be trained with MLM or NTP, but neither of these gets us per-cell embeddings. (3/7)
August 9, 2024 at 9:52 PM