banner
jacobpstein.bsky.social
@jacobpstein.bsky.social
Evidence-based data science, vibes-based basketball fan. Here for #tidytuesday, mostly. Code here: https://github.com/jacobpstein
If you're on the data science job hunt and feeling discouraged just know that there are terrible clustering algos out there, in production, and you can do much better. Like, look at these 'similar shoes' from DSW. If you're reading this, you can get better results. I believe in you!
July 10, 2025 at 6:10 PM
Sometimes you accidentally write a recursive loop and that's when the fun really starts.
June 18, 2025 at 6:16 PM
I barely had any time for #TidyTuesday this week and want to revisit these Gutenberg data sets with some LLM tools at some point. I looked at life spans but kept it to the period since the modern novel was born. This could be a good interactive if I were doing a quarto presentation
June 4, 2025 at 5:36 PM
I know it's lame to highlight a corporate-y Getty photo, but this is one of those cool basketball pics that highlights how these guys are so good at doing otherworldly stuff--like somehow shooting a ball while seemingly falling and being blocked
May 30, 2025 at 12:03 AM
This week's #TidyTuesday was a tough one! Lots of correlated values, no domain knowledge, and small-n groups. I spent a long time flailing around trying to figure out what might be interesting. Predicting hit points based on the other data seemed like a good way to compare model types.
May 28, 2025 at 2:30 PM
@owenphillips.bsky.social don't quite know what to make of this, but the correlation between two point attempts and shot quality went positive on average for the first time this season. Could be spurious, could be mid-range theory at play
May 22, 2025 at 2:48 AM
I have been re-reading Ferrante's Neapolitan Novels so this week's #TidyTuesday felt very much on theme. I started to go down a rabbit hole of spatial modeling, but decided that for getting this done while I have a little time, it's better just to make a nice descriptive plot.
May 16, 2025 at 5:54 PM
Oof, didn't have much time for #TidyTuesday today, but thought I'd look at the density of sessions at the #UseR conference by time slot. I always like these graphs even if I'm not totally sure they make sense. The colors are inspired by LaCroix and the excellent LaCroixR package.
April 30, 2025 at 12:55 AM
For this week's #TidyTuesday, I looked at how poisson and OLS regression differ. I don't think I really learned about this in school, but you can run into all kinds of issues if you want to model count data, like auto fatalities in this week's data.
April 23, 2025 at 3:22 PM
New here and new to #TidyTuesday! This week's challenge looked at the Palmer Penguins dataset, which was a good chance to rant (in my head mostly) about multilevel models!
April 17, 2025 at 2:36 PM