Lightnews — Scholar-powered news

Artur Szałata

@chatgtp.bsky.social

1.9K followers 4.2K following 30 posts

Machine learning for molecular biology. ELLIS PhD student at Fabian Theis lab. EPFL alumnus.

Posts Replies Media Videos

Artur Szałata

@chatgtp.bsky.social

Excited to share that I started my summer at @genentech.bsky.social BRAID Perturbation team in SF with Alex Wu!

It's my first time on the West Coast - If you are around and would like to talk about ML and/or biology, hit me up!

Looking fwd to the AI x Bio Unconference tomorrow 🚀

June 18, 2025 at 4:37 AM

Artur Szałata

@chatgtp.bsky.social

Thanks to a great co-lead Andrew Benz, supervisors Daniel Burkhardt, Malte Luecken, @fabiantheis.bsky.social, help with OP from Robrecht Cannoodt, and everyone involved!
@chanzuckerberg.bsky.social and Cellarity for funding to generate data, Kaggle for competition, and SaturnCloud for compute. 🧵8/8

November 15, 2024 at 10:37 PM

Artur Szałata

@chatgtp.bsky.social

We implemented the winning Kaggle competition methods in our Open Problems Perturbation Prediction (OP3) platform. It has a robust eval with baseline methods, and dataset bootstrapping. Simple NNs (with a few caveats) perform best. Also, drugs with larger effects are more difficult to predict. 🧵6/8

November 15, 2024 at 10:37 PM

Artur Szałata

@chatgtp.bsky.social

Single-cell perturbation readouts have batch effects and a low signal-to-noise ratio. DEG analysis with GLMs and replicates help, but we need to decide on perturbation effect representation - we developed a “cross-donor retrieval” metric for perturbation effect representation evaluation. 🧵4/8

November 15, 2024 at 10:37 PM

Artur Szałata

@chatgtp.bsky.social

We generated a single-cell dataset of 146 drug perturbations in PBMCs of 3 human donors. We used it to benchmark perturbation effect predictions for held-out (cell type, compound) pairs. Perturbation effects are derived from DEG - contrasts treatment vs control in a generalized linear model. 🧵3/8

November 15, 2024 at 10:37 PM

Artur Szałata

@chatgtp.bsky.social

The chemical and biological space of possible perturbations is very large. Thus, methods try to learn from a fraction of possible experiments and infer the rest. However, existing perturbation datasets are limited by size and data quality issues. 🧵2/8

November 15, 2024 at 10:37 PM

Artur Szałata

@chatgtp.bsky.social

Unlike popular in the field autoencoders, transformers take as input a set or a variable-length sequence of embeddings. Transformers rely on attention mechanism and can be trained with MLM or NTP, but neither of these gets us per-cell embeddings. (3/7)

August 9, 2024 at 9:52 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news