Shae Mclaughlin
shaemcl.bsky.social
Shae Mclaughlin
@shaemcl.bsky.social
Med student turned researcher studying gene regulation 🧬 | Currently MS (Health Data Science) @ UCSF 🌁 | Interested in: epigenome editing & deep learning 🧠
There's much more work to be done in evaluating and interpreting the features, but these early findings suggest this could be a valuable approach for understanding what these models learn about the language of life 🧬 8/8
December 12, 2024 at 2:47 AM
The features have distinct but sometimes overlapping activation patterns, suggesting they might detect different parts of the same alu element or regulatory sequence. There are alot of these! I've only investigated a handful of features, these initial results are promising! 7/8
December 12, 2024 at 2:47 AM
To visualize where these features activate across the genome, I uploaded the activation sites to the UCSC Genome Browser. Comparing against genomic annotations reveals these features tend to activate in consistent patterns - often near SINE elements 6/8
December 12, 2024 at 2:47 AM
Or this (less significant) alignment with MEF2A binding site for feature 3990 5/8
December 12, 2024 at 2:47 AM
Some of these also look like they may be transcription factor binding sites — such as feature 1685 here that gets a highly significant result for alignment with this ZFN460 binding site 4/8
December 12, 2024 at 2:47 AM
Many of these detect Alu elements —stretches of DNA about 300bp that are abundant through copy-paste events during evolution. Making up over 10% of our genome, these are the most common mobile genetic elements in humans 3/8
December 12, 2024 at 2:47 AM
To identify these motifs, I looked at the 30bp sequence windows surrounding its strongest activation sites for a given feature. When examining the top activating sequences together, clear patterns emerged—showing that these features reliably detect specific DNA sequences 2/8
December 12, 2024 at 2:47 AM