Kuan-Hao Chao
banner
kuanhaochao.bsky.social
Kuan-Hao Chao
@kuanhaochao.bsky.social
Senior Deep Learning Scientist Illumina | CS PhD candidate at CS PhD student at @jhu.edu @jhucompsci.bsky.social

Teaching machines to learn biology 🧬💻

https://khchao.com/
Heart full—I defended my PhD in @jhucompsci.bsky.social @jhu.edu(Aug 25). Thanks to my advisors @stevensalzberg.bsky.social and Mihaela Pertea; my committee— @benlangmead.bsky.social, David Kelley & Anqi Liu; and all labmates, collaborators, and mentors. You made me who I am today. Deeply grateful!
August 26, 2025 at 1:28 PM
OpenSpliceAI offers researchers a comprehensive suite of tools for studying transcript splicing—from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.
🔗 Explore our documentation here: ccb.jhu.edu/openspliceai/ 10/
March 24, 2025 at 1:57 PM
We showed general patterns of donor and acceptor sites across hundreds of splice site motifs learned by OpenSpliceAI. Moreover, it confidently predicts cryptic splicing events—such as acceptor gain in MYBPC3 and novel exon gain in OPA1. 9/
March 24, 2025 at 1:56 PM
In silico mutagenesis (ISM) confirms that OpenSpliceAI focuses on the same key regions and patterns for splice site prediction at U2SURP and DST as SpliceAI, and effectively capturing a splicing enhancer. We also demonstrate its ability to capture the full gene span of CFTR. 8/
March 24, 2025 at 1:56 PM
We enhanced model reliability by applying temperature scaling for calibration. Using expected calibration error and reliability diagrams, we observed a smoother probability distribution that aligns more closely with the empirical distribution, making it better! 7/
March 24, 2025 at 1:55 PM
OpenSpliceAI also supports transfer learning. Our experiments show that models pre-trained on human and applied to species of interest achieve near-optimal performance in just one epoch, drastically reducing compute time while enhancing predictions for species with smaller genomes 6/
March 24, 2025 at 1:55 PM
Comparing species-specific training versus running SpliceAI directly, our results confirm that one-size-fits-all generalization isn’t enough. OpenSpliceAI’s ability to retrain on specific species sets it apart from SpliceAI. 5/
March 24, 2025 at 1:54 PM
Our benchmarks show that OpenSpliceAI outperforms SpliceAI in elapsed time, memory usage, and GPU peak memory across various gene lengths. Thanks to dynamic PyTorch graphs, batch prediction, and optimized engineering, full-chromosome predictions are now a reality! 4/
March 24, 2025 at 1:53 PM
Built with six modular components, OpenSpliceAI lets you:

• Create species-specific datasets
• Train custom models
• Calibrate predictions
• Apply transfer learning from human models
• Predict on genes / entire chromosomes
• Assess variant impacts on cryptic splicing
3/
March 24, 2025 at 1:52 PM
We have shown LiftOn reliably transfers annotations within species, as shown in human, mouse, honey bee, rice, and Arabidopsis thaliana. It also effectively maps annotations between species pairs as distant as mouse and rat, or Drosophila melanogaster and D. erecta. 4/8
March 3, 2025 at 2:32 PM
LiftOn resolves overlapping gene loci and multi-mapping issues within large gene families. It identifies extra copies of protein-coding genes in the target genome and reports various types of mutations for proteins that do not perfectly match the reference. 3/8
March 3, 2025 at 2:29 PM
LiftOn uses a protein-maximization algorithm combining DNA and protein sequence alignment to generate gene annotations that maximize similarity to reference proteins. It also checks alternative open reading frames to find the longest match to the reference protein. 2/8
March 3, 2025 at 2:28 PM
DNA-based lift-over methods are limited to closely related species, while protein-based methods struggle with accuracy and pseudogenes. Combining both is the solution, and LiftOn is the first to do so, outperforming what either method alone can achieve 1/8
ccb.jhu.edu/lifton/
March 3, 2025 at 2:27 PM