Rachel Thomas
math-rachel.bsky.social
Rachel Thomas
@math-rachel.bsky.social
AI researcher going back to school for immunology
fast.ai co-founder, math PhD, data scientist
Writing: https://rachel.fast.ai/
Claims that AI will cure cancer are often used as superficial marketing, ignoring how AI could further disempower patients (whose expertise is already disregarded by medical system)

Hiding health-related data from the public does not improve the lives of patients. 8/

rachel.fast.ai/posts/2024-0...
Rachel Thomas, PhD - “AI will cure cancer” misunderstands both AI and medicine
an AI researcher going back to school for immunology
rachel.fast.ai
January 24, 2025 at 10:25 PM
Promises about what AI can achieve with electronic health records must be tempered with the awareness that the data within is too often biased, incorrect, or missing.

(studies on some of the diagnosis delays, pain mismanagement, and biases that are recorded as fact in medical data) 7/
January 24, 2025 at 10:25 PM
One thing holding back the application of AI to medicine is lack of the *right* data. It is not just that data is scattered & hard to access; many interesting variables aren’t being measured or collected at all. 6/
January 24, 2025 at 10:25 PM
Too many AI projects begin at the wrong starting point. They start with an existing dataset, and ask “what can we do with this data?”

The harder question is, “what are the biggest questions in your area, and what data would be useful to answering them?” 5/
January 24, 2025 at 10:25 PM
“What Alphafold2 pulled off — applying a clever model to a large body of pre-existing data to revolutionize a field — is something that will be extremely hard to replicate. Why? Because we’re almost out of that pre-existing data.” -- @owlposting1.bsky.social www.owlposting.com/p/wet-lab-in... 4/
Wet-lab innovations will lead the AI revolution in biology
1.9k words, 9 minutes reading time
www.owlposting.com
January 24, 2025 at 10:25 PM
Some mistakenly believe AI can easily create magic solutions, without understanding the need for high-quality data.

The success of AlphaFold was made possible by 50 years of prior work gathering protein structures into a rich database (Protein Data Bank launched in 1971) 3/
January 24, 2025 at 10:25 PM
Choices about which data to collect & which to neglect have long been shaped by power disparities and social & financial influences.

Missing Data Sets are “blank spots that exist in spaces that are otherwise data-saturated” (Mimi Onuoha, 2016)

github.com/MimiOnuoha/m... 2/
GitHub - MimiOnuoha/missing-datasets: An overview and exploration of the concept of missing datasets.
An overview and exploration of the concept of missing datasets. - GitHub - MimiOnuoha/missing-datasets: An overview and exploration of the concept of missing datasets.
github.com
January 24, 2025 at 10:25 PM
In addition to the challenge of gathering more data, another challenge of pathology models is needing to capture both local patterns (that show up in a small tile within a slide) and global patterns across the whole slide. 6/

(Image from Chen, et al, 2020, Hierarchical Image Pyramid Transformer)
January 16, 2025 at 10:13 PM
The Cancer Genome Atlas (TCGA) was an ambitious project begun in 2006 by the National Cancer Institute. Samples were collected from > 11,000 patients w/ 33 cancer types.

All 3 of the above papers (UNI, Prov-GigaPath, & kaiko ai) concluded TCGA is not large enough for effective foundation models 5/
January 16, 2025 at 10:13 PM
Another interesting paper evaluated the impact of scaling model size and training dataset size. They found limited need to scale *model size* beyond a certain point, but that *larger datasets* continued to lead to increased performance. 4/

arxiv.org/abs/2404.15217
Towards Large-Scale Training of Pathology Foundation Models
Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build f...
arxiv.org
January 16, 2025 at 10:13 PM
Two big pathology foundation models were published last year: UNI and Prov-GigaPath. They achieved state-of-the-art results on dozens of tasks (although were not directly compared). 3/

🟣 www.nature.com/articles/s41...
🟣 www.nature.com/articles/s41...
A whole-slide foundation model for digital pathology from real-world data - Nature
Prov-GigaPath, a whole-slide pathology foundation model pretrained on a large dataset containing around 1.3 billion pathology images, attains state-of-the-art performance in cancer classification and ...
www.nature.com
January 16, 2025 at 10:13 PM
The powerful idea behind *foundation models* is to train a on many datasets (e.g. tissue images from many organs) and on multiple tasks (e.g. recognizing cancer, segmenting cells, predicting treatment outcomes)

Patterns learned from one dataset or one task are likely to generalize to others. 2/
January 16, 2025 at 10:13 PM
An unmet need in lung cancer research: how to integrate -omics to understand extracellular matrix (ECM) remodeling

(This is the first talk I've seen incorporating the ECM with omics-- it's an interesting perspective!)

Amelia Parker 6/
December 3, 2024 at 4:07 AM
The extracellular matrix is a collection of proteins changing over time and space. It has different profiles for different cancer subtypes & profiles.

-- Amelia Parker #multiomics2024 /5
December 3, 2024 at 3:58 AM
I can't share videos with 🦋, but there were some neat videos of 3D spatial information from various cancers & the additional info 3D imaging can provide.

Zoe West 4/
December 3, 2024 at 3:50 AM
Typically 3D imaging is done for proteins. A new pipeline that allows 3D imaging of RNA spatial distributions:

www.biorxiv.org/content/10.1... from Zoe West #multiomics2024 3/
Whole-Brain Three-Dimensional Imaging of RNAs at Single-Cell Resolution
Whole-brain three-dimensional (3D) imaging is desirable to obtain a comprehensive and unbiased view of architecture and neural circuitry. However, current spatial analytic methods for brain RNAs are l...
www.biorxiv.org
December 3, 2024 at 3:43 AM
Altered cellular metabolism is one of the hallmark features of cancer. This includes altered:
- glycolysis
- oxidative stress
- fatty acid
- amino acid

Spatial data can be used to understand these altered pathways in treatment resistant vs responsive cancer patients

-- Naomi Berrell 2/
December 3, 2024 at 3:31 AM