Rohit Singh
rohitsingh8080.bsky.social
Rohit Singh
@rohitsingh8080.bsky.social
Computational biologist. Faculty @DukeU. Co-founder http://martini.ai. Prev @MIT_CSAIL. Did quant investing for a while, before returning to research.

https://singhlab.net
Reposted by Rohit Singh
Yes, we have lots of exciting collaborative projects at the interface of computation and biology. Deep expertise in many domains between our labs, so a wonderful and committed training environment!
May 9, 2025 at 2:59 PM
Our fantastic trainees and collaborators made this possible. Kanchan Jha, Aditya Parekh and Pooja Parameswaran led the dry-lab work, while Daichi Shonai and Aki Uezu led the wet-lab work.
May 9, 2025 at 2:40 PM
This was a wonderful collab with @scottsoderling.bsky.social , whose lab is situated next to ours.

If you want to do cool collaborative work like this, join us! We're building a great ecosystem of AIxBio at Duke.
May 9, 2025 at 2:40 PM
I loved working on this project! Neither the kinase specificity prediction nor the proximity proteomics is enough on its own– you need both.

I think this project shows how a close collaboration between biologists and computer scientists can introduce entirely new capabilities.
May 9, 2025 at 2:40 PM
With KolossuS, we studied sleepiness in mice, and especially the signaling impact of Sik3, a kinase whose mutation leads to sleepy mice.

We think our Kolossus + proteomics approach has a ton of potential in deconvolving kinases involved in specific processes. 12/
May 9, 2025 at 2:40 PM
And of course, the interpretability of the kinase embedding space led to some fun explorations.

For example, we asked if the phylogeny of kinase families actually corresponds to substrate preferences? Broadly yes, but with a few key exceptions. 11/
May 9, 2025 at 2:40 PM
KolossuS’ architecture applies broadly across all kinases (and generalizes to other species) and it is well-calibrated.

This, combined with a proximity proteomics that lets us assay in a tissue of interest and sub-cellular locale, gives us the end-to-end solution we need. 10/
May 9, 2025 at 2:40 PM
A poorly calibrated model might always score one kinase highly even if, on a per-kinase basis, it is accurate on substrate specificities. See this note from the preprint: 9/
May 9, 2025 at 2:40 PM
Breadth and interpretability are self-explanatory, but why emphasize calibration? And what does it even mean?

The key insight is that given a phosphorylated peptide, we’ll computationally screen it against every human kinase. Calibration is critical for that. 8/
May 9, 2025 at 2:40 PM
Using the ESM-2 15B model (PLM scaling worked, for once!) we predict kinase-substrate specificity by learning a co-embedding of the two.

As models go, KolossuS is relatively simple. In its design, we emphasized three aspects: breadth, calibration and interpretability. 7/
May 9, 2025 at 2:40 PM
The relevant assay here is phosphoproteomics: you can identify the phosphorylated peptides in a sample. Proximity proteomics will let you further target a specific tissue and sub-cellular neighborhood. But that doesn’t tell you which kinases are active.

Enter KolossuS.
May 9, 2025 at 2:40 PM
Identifying the precise kinase involved in your pathway and tissue of interest therefore requires a mix of computation and experimentation. 5/
May 9, 2025 at 2:40 PM
As writers, the specificity of kinases is only moderately high– multiple kinases can often phosphorylate a substrate. The moderate specificity makes it easier to have signal integration but it of course leads to disease risk. 4/
May 9, 2025 at 2:40 PM
Kinases are the proto-example of something nature does often: take a simple biophysical phenomenon (here phosphorylation, but see also ubiquitination, acetylation etc.) and supercharge it as a signaling vehicle by evolving a spectrum of signal writers (e.g., kinases) and readers. 3/
May 9, 2025 at 2:40 PM
Surprisingly little is known about kinases. Despite their therapeutic and biological importance, 80% of the human kinome is “dark” i.e. we don’t have a good sense of the substrates a kinase targets, and in which cell types or sub-cellular compartments. 2/
May 9, 2025 at 2:40 PM
Reposted by Rohit Singh
The BEST part: This would not have been possible without the close (and SO FUN!) collaboration of my lab with @rohitsingh8080.bsky.social and the lab of Masashi Yanagisawa.
April 28, 2025 at 10:00 PM
For instance, there's a temptation to interpret the embeddings as implying a gene regulatory network. I don't think they do, at least not as per the typical GRN interpretation of causality within an individual cell.
April 6, 2025 at 4:06 AM
Agree completely. Specifically, re single-cell foundation models: I think they are best thought of as representation learners that summarize massive scRNA-seq datasets usefully. But those representations have to be carefully interpreted.
April 6, 2025 at 3:57 AM
Reposted by Rohit Singh
100% agree with this.
April 6, 2025 at 2:10 AM