Shae Mclaughlin
shaemcl.bsky.social
Shae Mclaughlin
@shaemcl.bsky.social
Med student turned researcher studying gene regulation 🧬 | Currently MS (Health Data Science) @ UCSF 🌁 | Interested in: epigenome editing & deep learning 🧠
To visualize where these features activate across the genome, I uploaded the activation sites to the UCSC Genome Browser. Comparing against genomic annotations reveals these features tend to activate in consistent patterns - often near SINE elements 6/8
December 12, 2024 at 2:47 AM
Or this (less significant) alignment with MEF2A binding site for feature 3990 5/8
December 12, 2024 at 2:47 AM
Some of these also look like they may be transcription factor binding sites — such as feature 1685 here that gets a highly significant result for alignment with this ZFN460 binding site 4/8
December 12, 2024 at 2:47 AM
To identify these motifs, I looked at the 30bp sequence windows surrounding its strongest activation sites for a given feature. When examining the top activating sequences together, clear patterns emerged—showing that these features reliably detect specific DNA sequences 2/8
December 12, 2024 at 2:47 AM
I trained a sparse autoencoder on the middle layer residual stream of my genome language model and found human-interpretable latent features that consistently detect specific DNA motifs!

🧵1/8
December 12, 2024 at 2:47 AM