Sameed Siddiqui
sameedms.bsky.social
Sameed Siddiqui
@sameedms.bsky.social
Californian lost in the Northeast ☀️.

PhD @ MIT Computational and Systems Biology | MBA Fellow at MIT Sloan. @SabetiLab member
Finally, thanks to the team! A huge shoutout to my friend and mentee @krithik-bs.bsky.social. Also infinitely grateful for #AlbertGu for his advice, and #MichaelMitzenmacher @pardissabeti.bsky.social for their mentorship and leadership. So much laughter while making this paper, can't wait for more.
March 21, 2025 at 9:16 PM
Check out our paper "Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences," detailing how mathematical insights overcome computational limitations in biology.
arxiv.org/abs/2503.16351
Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences
Deep learning architectures such as convolutional neural networks and Transformers have revolutionized biological sequence modeling, with recent advances driven by scaling up foundation and task-speci...
arxiv.org
March 21, 2025 at 9:16 PM
This work shows that principled mathematical insights, like approximation of epistatic interactions, can provide an accessible and performant alternative to large foundation models—suggesting broader applicability beyond biological sequences.
March 21, 2025 at 9:16 PM
We are excited about Lyra's potential to accelerate discoveries in molecular biology, therapeutic development, and protein engineering.
March 21, 2025 at 9:16 PM
Lyra makes cutting-edge biological modeling accessible to labs without extensive compute resources. Instead of relying on massive GPU clusters, Lyra empowers researchers to train state-of-the-art models directly on their own laptops.
March 21, 2025 at 9:16 PM
Lyra’s subquadratic O(N log N) complexity dramatically reduces memory (125x–2600x less than Evo and ESM-1b) and accelerates inference—up to 239x faster than ESM-1b, processing sequences up to 1M length.
March 21, 2025 at 9:16 PM
RNA-dependent RNA polymerases (RDRPs) are essential markers for RNA virus detection. Lyra achieves a near-perfect 0.998 true positive rate, matching LucaProt-ESM with over 60,000x fewer parameters, accelerating pathogen discovery without needing large-scale GPU infrastructure.
March 21, 2025 at 9:16 PM
Lyra achieves SOTA results in 6 out of 7 intrinsically disordered protein region tasks, with an average AUC of 0.89, outperforming a ProtT5-based model (avg AUC 0.86). Lyra accomplishes this using only 55K parameters, compared to ProtT5’s 3 billion parameters—a >50,000-fold reduction in model size.
March 21, 2025 at 9:16 PM
Lyra’s consistently strong performance across different tasks using orders of magnitude fewer parameters allows researchers to spend less time optimizing models and more time generating biological insights.
March 21, 2025 at 9:16 PM
Lyra sets records in 5/9 RNA BEACON benchmarking tasks tested, including nearly solving the splice-site prediction dataset (98.89% accuracy vs previous best 50.55%) and almost doubling performance on structural score imputation (0.73 vs 0.42).
March 21, 2025 at 9:16 PM
We tested Lyra on 101 diverse biological tasks spanning:

1. Proteomics

2. Genomics

3. CRISPR guide efficacy

Lyra set new performance records in 79 out of 101 tasks, w/ substantially smaller models than competing architectures.
March 21, 2025 at 9:16 PM
We designed Lyra with two simple components: Projected Gated Convolutions (PGC), which enhance local feature extraction, and diagonalized State Space Models (S4D), which capture global epistatic interactions. In doing so, Lyra efficiently captures both global and local epistatic relationships.
March 21, 2025 at 9:16 PM
We drew a mathematical connection between State Space Models (SSMs) and polynomial approximation, showing how their hidden states can naturally approximate the polynomial terms that govern epistatic relationships. This makes SSMs ideal for modeling biological functions as multilinear polynomials.
March 21, 2025 at 9:16 PM
This perspective provides a principled mathematical framework for modeling sequence-function relationships.
March 21, 2025 at 9:16 PM
To unify biological sequence modeling across DNA, RNA, and proteins into a single computational framework, we revisited epistasis—the phenomenon where mutations influence each other—which can be characterized by multilinear polynomials.
March 21, 2025 at 9:16 PM
Breaking down how biological sequences encode molecular functions remains a central challenge in computational biology. For example, given a GFP sequence, can we predict its fluorescence brightness?
March 21, 2025 at 9:16 PM