Sumit Mukherjee
smukherjee89.bsky.social
Sumit Mukherjee
@smukherjee89.bsky.social
Senior Manager @ Oracle Health. Interested in clinical LLMs, ML + genetics!! Ex-insitro, Microsoft AI4Good, UW. Biriyani expert. Kolkata-> Seattle. Views are my own.

Scholar: https://shorturl.at/QTBGR
Linkedin: https://shorturl.at/WuGl
Pinned
Okay friends, I am back with a new thread and this time about a new pre-print that I co-authored. Today's topic is 'ML assisted GWAS'. What is that, you ask? It's basically estimating missing values in phenotypes of interest using associated measurements (1/)
My heart goes to out to the ordinary people of Iran and Israel for what seems to me, as an outsider, a completely pointless war. I hope this insanity comes to an end quickly. My hear goes out to the civilians in these countries who really didn't want or deserve this.
June 16, 2025 at 7:11 PM
Today’s home loss to Arsenal makes it even more obvious.
Ancelotti needs to be fired. Real Madrid have just been uninspiring and at times terrible this season.
April 16, 2025 at 9:14 PM
Ancelotti needs to be fired. Real Madrid have just been uninspiring and at times terrible this season.
April 8, 2025 at 9:03 PM
As more and more tech companies claim that LLMs will make certain classes of jobs unnecessary in a few years, it amazes me that there is little to no discussion on the effects on the labor force/economy and other societal impacts.
April 6, 2025 at 2:30 PM
After 3.5 wonderful years at @insitro.bsky.social where I got to work on many interesting problems at the intersection of genetics and ML, I will be joining Oracle Health on Monday as a Senior Applied Science Manager to work on clinical LLMs. (1/)
April 4, 2025 at 10:32 PM
South Asian friends living in the US - is it me or is Indian food just way more spicy in the US than it is in India (on average)? If y'all agree with me, why do you think this is? Even dishes that were sweet-ish in India, are somehow spicy here.
March 27, 2025 at 6:52 PM
Happy Nowruz to all my Iranian friends and whoever else celebrates it. Hope y'all have a wonderful day.
a floral design with the words happy nowruz
ALT: a floral design with the words happy nowruz
media.tenor.com
March 20, 2025 at 5:17 PM
Reposted by Sumit Mukherjee
We have a new method, HypotheSAEs, for identifying *interpretable text features that predict a target variable* (aka hypothesis generation).

What features of a headline predict engagement?

What features of a clinical note predict whether a patient will develop cancer?

1/
March 18, 2025 at 6:26 PM
Reposted by Sumit Mukherjee
I'm deeply honored to be elected as an ISCB Fellow this year!🏅 Following last year’s ISCB Innovator Award, this recognition strengthens my commitment to advancing computational biology.🧬 Grateful to my students, mentors, and colleagues for their support!
www.iscb.org/iscb-news-it...
March 11, 2025 at 2:54 PM
Imma be honest, I didn't really see a Gavin Newsom and Charlie Kirk colab coming: www.youtube.com/watch?v=9XJ6...? But then again, there are many things about our political climate in 2025 that I didn't see coming. Might be a me thing.
And, This Is Charlie Kirk | This is Gavin Newsom
YouTube video by This is Gavin Newsom
www.youtube.com
March 7, 2025 at 9:58 PM
It's shameful that we have billionaires now feeling empowered to go on highly visible podcasts and say Social Security is a 'ponzi scheme' without explaining a clear alternative that doesn't involve kicking seniors to the curb. Truly shameful!
March 3, 2025 at 8:13 PM
Jesus Christ, is this true: www.cbsnews.com/news/elon-mu...? Seems like twitter acquisition all over again. And we all saw how well that went.
DOGE's Elon Musk says federal employees must document their work or resign
Elon Musk's instructions came after President Trump praised the head of DOGE but would like to see him get more aggressive in reducing the federal government.
www.cbsnews.com
February 23, 2025 at 2:22 PM
Happy valentine's day to my science fam!! Hope y'all keep doing all the great science!! :)
February 14, 2025 at 5:48 PM
Reposted by Sumit Mukherjee
There's still time to submit an abstract to the legendary Biology of Genomes meeting! Come and present your work alongside the stellar lineup of invited speakers and keynotes. #BOG25
February 11, 2025 at 5:59 PM
Awesome new paper led by @zmccaw.bsky.social and my amazing colleagues at @insitro.bsky.social !! Great explainer thread below 👇🧪💻🧬
February 11, 2025 at 6:04 PM
Reposted by Sumit Mukherjee
This is the most relevant article to NIH and research cuts I’ve seen.

Imagine if this was today , how many people would be saying “Why are we studying Gila Monsters and their impact on diabetes ? That’s wasted money !”

globalnews.ca/news/9793403...
How a Canadian scientist and a venomous lizard helped pave the way for Ozempic - National | Globalnews.ca
In 1984, Dr. Daniel Drucker, an endocrinologist from the University of Toronto, discovered a hormone that helped pave the way for popular diabetes drugs such as Ozempic.
globalnews.ca
February 9, 2025 at 9:58 PM
I really enjoyed reading this paper. Very interesting and innovative approach to characterize biomarkers and subtypes of MASLD.
February 7, 2025 at 12:31 AM
Very cool paper 👇🏼
February 4, 2025 at 9:39 PM
Very interesting 👇🏼
New (and hotly anticipated - at least by me) preprint from my group describing a better way to partition training data for genomic-trained models to solve the long-neglected problem of homology-based data leakage. Thread from first author @muntakimrafi.bsky.social 👇
0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it
February 2, 2025 at 8:49 PM
Reposted by Sumit Mukherjee
0/ Essential reading for anyone training or using sequence-function models trained on genomic sequences! 🚨 In our new preprint, we explore the ways homology within genomes can cause leakage when training sequence-based models and ways to prevent it
January 27, 2025 at 11:04 PM
Reposted by Sumit Mukherjee
The success of LLMs has inspired their extension to genomics data.

A preprints reports that such models lack understanding of genomics and provide minimal utility, even for basic tasks such as sequence classification.

www.biorxiv.org/content/10.1...
Genomic Foundationless Models: Pretraining Does Not Promise Performance
The success of Large Language Models has inspired the development of Genomic Foundation Models (GFMs) through similar pretraining techniques. However, the relationship between pretraining performance ...
www.biorxiv.org
February 1, 2025 at 11:41 PM
I was recently trying out this tool from google called NotebookLM that converts PDFs to a podcast format. It's pretty cool. I used it to summarize this paper and found the description pretty accurate: notebooklm.google.com/notebook/7ab.... Check it out. 🧪💻🧬
February 1, 2025 at 12:43 AM
My thread on the genomic SEM paper (www.nature.com/articles/s41...). I forgot to add 🧪💻🧬 #StatisticalGenetics
February 1, 2025 at 12:36 AM
Okay, I am back with another thread on multivariate phenotypes. This will be focused on summarizing this paper: www.nature.com/articles/s41.... This paper introduces a SEM based approach to estimate latent factors underlying multiple relevant traits and using them to improve the power of GWASs. (1/)
Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits - Nature Human Behaviour
Grotzinger et al. develop a multivariate method for analysing the joint genetic architectures of complex traits: genomic structural equation modelling. They provide several applications of the method,...
www.nature.com
January 31, 2025 at 11:43 PM