Lightnews — Scholar-powered news

Kuan-Hao Chao

@kuanhaochao.bsky.social

Senior Deep Learning Scientist Illumina | CS PhD candidate at CS PhD student at @jhu.edu @jhucompsci.bsky.social

Teaching machines to learn biology 🧬💻

https://khchao.com/

Posts Replies Media Videos

Kuan-Hao Chao

@kuanhaochao.bsky.social

Heart full—I defended my PhD in @jhucompsci.bsky.social @jhu.edu(Aug 25). Thanks to my advisors @stevensalzberg.bsky.social and Mihaela Pertea; my committee— @benlangmead.bsky.social, David Kelley & Anqi Liu; and all labmates, collaborators, and mentors. You made me who I am today. Deeply grateful!

August 26, 2025 at 1:28 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

OpenSpliceAI offers researchers a comprehensive suite of tools for studying transcript splicing—from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.
🔗 Explore our documentation here: ccb.jhu.edu/openspliceai/ 10/

March 24, 2025 at 1:57 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

We showed general patterns of donor and acceptor sites across hundreds of splice site motifs learned by OpenSpliceAI. Moreover, it confidently predicts cryptic splicing events—such as acceptor gain in MYBPC3 and novel exon gain in OPA1. 9/

March 24, 2025 at 1:56 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

In silico mutagenesis (ISM) confirms that OpenSpliceAI focuses on the same key regions and patterns for splice site prediction at U2SURP and DST as SpliceAI, and effectively capturing a splicing enhancer. We also demonstrate its ability to capture the full gene span of CFTR. 8/

March 24, 2025 at 1:56 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

We enhanced model reliability by applying temperature scaling for calibration. Using expected calibration error and reliability diagrams, we observed a smoother probability distribution that aligns more closely with the empirical distribution, making it better! 7/

March 24, 2025 at 1:55 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

OpenSpliceAI also supports transfer learning. Our experiments show that models pre-trained on human and applied to species of interest achieve near-optimal performance in just one epoch, drastically reducing compute time while enhancing predictions for species with smaller genomes 6/

March 24, 2025 at 1:55 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

Comparing species-specific training versus running SpliceAI directly, our results confirm that one-size-fits-all generalization isn’t enough. OpenSpliceAI’s ability to retrain on specific species sets it apart from SpliceAI. 5/

March 24, 2025 at 1:54 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

Our benchmarks show that OpenSpliceAI outperforms SpliceAI in elapsed time, memory usage, and GPU peak memory across various gene lengths. Thanks to dynamic PyTorch graphs, batch prediction, and optimized engineering, full-chromosome predictions are now a reality! 4/

March 24, 2025 at 1:53 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

Built with six modular components, OpenSpliceAI lets you:

• Create species-specific datasets
• Train custom models
• Calibrate predictions
• Apply transfer learning from human models
• Predict on genes / entire chromosomes
• Assess variant impacts on cryptic splicing
3/

March 24, 2025 at 1:52 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

We have shown LiftOn reliably transfers annotations within species, as shown in human, mouse, honey bee, rice, and Arabidopsis thaliana. It also effectively maps annotations between species pairs as distant as mouse and rat, or Drosophila melanogaster and D. erecta. 4/8

March 3, 2025 at 2:32 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

LiftOn resolves overlapping gene loci and multi-mapping issues within large gene families. It identifies extra copies of protein-coding genes in the target genome and reports various types of mutations for proteins that do not perfectly match the reference. 3/8

March 3, 2025 at 2:29 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

LiftOn uses a protein-maximization algorithm combining DNA and protein sequence alignment to generate gene annotations that maximize similarity to reference proteins. It also checks alternative open reading frames to find the longest match to the reference protein. 2/8

March 3, 2025 at 2:28 PM

Kuan-Hao Chao

@kuanhaochao.bsky.social

DNA-based lift-over methods are limited to closely related species, while protein-based methods struggle with accuracy and pseudogenes. Combining both is the solution, and LiftOn is the first to do so, outperforming what either method alone can achieve 1/8
ccb.jhu.edu/lifton/

March 3, 2025 at 2:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news