Lightnews — Scholar-powered news

Chris Miller

@chrismiller.science

2.1K followers 210 following 300 posts

I study cancer at Washington University in St Louis. Cancer Genomics, Bioinformatics, Data Viz, Tumor Evolution, AML, Immunotherapy, Irreverent humor 🧬 🖥️ mostly @chrisamiller on other platforms

Posts Replies Media Videos

Chris Miller

@chrismiller.science

a woman says i have a lot of questions number one how dare you ..

ALT: a woman says i have a lot of questions number one how dare you ..

media.tenor.com

November 7, 2025 at 4:06 AM

Chris Miller

@chrismiller.science

Apologies - I misread that and I've deleted the post. FWIW, if we figure 4-5 characters per word, it becomes more like 20-30 years. In any case, I'm all for new ways to express just how big the genome is!

November 2, 2025 at 1:15 AM

Chris Miller

@chrismiller.science

Well, this joke aged poorly 😆
bsky.app/profile/chri...

Obviously not ideal in many situations, and thanks for putting an explainer out there, Claus!

Chris Miller @chrismiller.science · Sep 18

When you want to do reproducible analysis in R, some packages require you to set a RNG seed. I'm not sure I trust anyone who doesn't immediately run `set.seed(42)`

October 22, 2025 at 2:06 PM

Reposted by Chris Miller

Quinta Jurecic

@qjurecic.bsky.social

he's going to come after you either way, so you might as well act with integrity

October 8, 2025 at 11:47 PM

Chris Miller

@chrismiller.science

I can't ever see the MTHFR gene symbol without doing a double take

October 7, 2025 at 7:10 PM

Chris Miller

@chrismiller.science

The lessons here:
1) Many gene names are stupid.
2) Edge cases may be rare, but they often matter. (TP53 is a key cancer gene that wouldn't be accessible without some special accommodations here).
3) As always, check your assumptions!
(fin)

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

For our little internal app, this probably won't matter much, and I will either set the number of records to 200 (because we generate almost no traffic) or might code up something that dynamically decides how many queries to return, based on which genes are in the input data. (8/n)

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

For those who are interested, the plot showing cumulative percentage of human HUGO gene names (from ensembl protein-coding genes) covered by a set number of records looks like this. So 8 results covers 99% of genes, 34 results covers 99.9% of genes, and it takes 199 to cover everything. (7/n)

Plot showing how many records need to be returned to ensure that each completely typed gene will be in the list.

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

So in order to guarantee that we'll get "AR" in the list, the value should be 200 records, which seems excessive. My instinctual guess of 30 wasn't bad, and covers 99.89% of gene names, but that's not all of them! (6/n)

a group of pokemon standing next to each other with gotta catch 'em all written on the bottom

ALT: a group of pokemon standing next to each other with gotta catch 'em all written on the bottom

media.tenor.com

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

It introduces a new question, though - this failed on TP53 with 10 results, so how many results need to be returned to handle all genes correctly? A few seconds of bash/grep later, I get the following list of 21 genes that will still fail. (5/n)

199 AR
120 PC
100 KL
78 ZNF7
67 ZNF2
67 CS
58 CP
58 ADA
57 SI
55 ZNF3
52 TH
51 C2
43 MAG
42 ZNF8
42 TNF
41 GPR1
37 DEFB1
36 USP1
36 GAL
34 PLEK
31 MET

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

After some digging, it turns out that mygene.info has a default max of 10 records returned for each query, and the first 10 hits include genes like "TP53TGS", "TP53TG3F", "TP53RK-DT", but not "TP53" itself. Adding "&size=30" to the query allows it to return 30 hits, which solved this problem (4/n)

October 6, 2025 at 6:54 PM

Chris Miller

@chrismiller.science

But when I manually tried the query string - something like mygene.info/v3/query?spe... - TP53 didn't appear in the returned json - I know that's not right! (3/n)

October 6, 2025 at 6:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news