Lightnews — Scholar-powered news

Darin Tsui

@darintsui.bsky.social

22 followers 270 following 7 posts

PhD candidate at Georgia Tech | ML for bioengineering | UC San Diego '23
darintsui.github.io

Posts Replies Media Videos

Darin Tsui

@darintsui.bsky.social

We develop SHAP zero into a Python package, which opens the door for efficient, principled, and scalable interpretability of biological sequence models!
⭐ Paper: arxiv.org/abs/2410.19236
⭐ Code: github.com/amirgroup-co...

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

The growing adoption of machine learning models for biological sequences has intensified the need for interpretable predictions, with Shapley values emerging as a theoretically grounded standard for m...

arxiv.org

September 22, 2025 at 3:17 PM

Darin Tsui

@darintsui.bsky.social

We then moved to apply SHAP zero to extract epistatic interactions in protein language models. Despite the total feature space being larger than a trillion, SHAP zero ran up to 7x faster in amortized time and uncovered interactions associated with structural stability.

September 22, 2025 at 3:16 PM

Darin Tsui

@darintsui.bsky.social

We demonstrate the power of SHAP zero by applying it to guide RNA and DNA repair models. SHAP zero uncovered high-order interactions at scale corresponding to known biological motifs, a task previously inaccessible due to the space of feature interactions.

September 22, 2025 at 3:15 PM

Darin Tsui

@darintsui.bsky.social

Our secret? We connect the sparse Fourier transform of a model with Shapley explanations. If the model is "compressible" (which many biological sequence models are!), SHAP zero amortizes the computation of feature interactions up to 1000x faster than current methods.

September 22, 2025 at 3:14 PM

Darin Tsui

@darintsui.bsky.social

Our core idea: instead of recomputing explanations for every new sequence from scratch, we pay a one-time cost to create a global sketch of the model. This enables SHAP zero to explain biological sequences from this sketch with near-zero marginal cost.

September 22, 2025 at 3:13 PM

Darin Tsui

@darintsui.bsky.social

The success of biological sequence models has created an urgent need to explain their predictions. However, computing Shapley values, often the gold standard of explanations, over thousands of sequences to extract biological insight remains computationally prohibitive.

September 22, 2025 at 3:13 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news