datascienceweekly.bsky.social
@datascienceweekly.bsky.social
Reposted
Do you teach #rstats? Do your students complain about how lame and old-fashioned dplyr is? Don't worry: I have the solution for you: github.com/hadley/genzp....

genzplyr is dplyr, but bussin fr fr no cap.
GitHub - hadley/genzplyr: dplyr but make it bussin fr fr no cap
dplyr but make it bussin fr fr no cap. Contribute to hadley/genzplyr development by creating an account on GitHub.
github.com
November 6, 2025 at 11:25 PM
Data Science Weekly - Issue 624, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 624
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
open.substack.com
November 6, 2025 at 11:57 PM
Reposted
There are a lot of great posts out there that aren't very highly ranked.

Don't rely on bluesky to find you great content; you can find it on your own! Here's how:

#Rstats via @northeasternu.bsky.social's Storybench

www.storybench.org/how-to-analy...
How to Analyze bluesky Posts and Trends with R - Storybench
If all you're doing on bluesky is scrolling, liking and posting, then you're riding a bike with training wheels. Here are simple tools using its open-source skeleton.
www.storybench.org
November 5, 2025 at 7:58 PM
Reposted
I'm excited to share side::kick(), an experimental open-source coding agent for RStudio built entirely in R. It can interact with your files, communicate with your active #rstats session, and run code.

Check it out: github.com/simonpcouch/...
November 5, 2025 at 3:57 PM
Reposted
In our latest blog post we compare the syntax of two Python libraries, Pandas and Polars for standard data-manipulation tasks.

#python #polars #pandas
Polars and Pandas - Working with the Data-Frame
In our latest blog post we compare the Pandas and Polars syntax for standard data-manipulation tasks.
www.jumpingrivers.com
November 6, 2025 at 2:12 PM
Reposted
New blog post: open-source software packages have surprising problems with the way they calculate weighted medians and other quantiles.

www.practicalsignificance.com/posts/weight...

#rstats #julialang
Weighted Quantile Weirdness and Bugs – Practical Significance
Computing quantiles is surprisingly complicated. It gets much weirder when you use weights, and popular software behaves in surprising ways that might trouble you.
www.practicalsignificance.com
November 5, 2025 at 4:30 PM
Reposted
Last post on causal inference: DID
Plus: I finally added the "copy to clipboard" button 😁

thestippe.github.io/statistics/d...
Difference in difference
Causal inference from 1850
thestippe.github.io
November 6, 2025 at 9:08 PM
Reposted
Still got my head in the clouds about this. The paper is really out now 😍
October 30, 2025 at 10:04 PM
Data Science Weekly - Issue 623, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 623
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
open.substack.com
October 30, 2025 at 10:23 PM
Reposted
new blog post:

Of course, someone has to write imperative code to build reproducible data science pipelines. It doesn’t have to be you.

brodrigues.co/posts/2025-1...
October 29, 2025 at 3:52 PM
Reposted
A Geometric Analysis of PCA

What property of the data distribution determines the excess risk of principal component analysis? In this paper, they provide a precise answer to this question.

arxiv.org/abs/2510.20978
October 29, 2025 at 12:29 PM
Reposted
I'll be keynoting at R/Pharma a week from today! The conference is free and virtual. I'll be focused on the mundane use cases of LLMs for wrangling data with #rstats, and the content should feel applicable for folks outside of pharma—come through. :)

Register: events.zoom.us/ev/Ai-geyS63...
October 29, 2025 at 7:32 PM
Reposted
Messy folders haunting your R projects? 👻

This Wednesday Oct 29 at 4:00 PM PDT, I'll lead a workshop on Efficient File Management in R with {fs} hosted by @r-ladies-stl.bsky.social. Let's clean and organize a spooky messy folder together!

Register at www.meetup.com/rladies-st-l...

#RStats #DataBS
October 27, 2025 at 9:51 PM
Reposted
We wrote an article explaining why you shouldn't put several variables into a regression model and report which are statistically significant - even as exploratory research. bmjmedicine.bmj.com/content/4/1/.... How did we do?
October 27, 2025 at 5:39 PM
Reposted
jacobtomlinson.dev/posts/2025/t...

Highly relatable for anyone that has ever written a line of code used by other people

Lovely little post from @jacobtomlinson.dev
The Majority Of Your Users
The majority of your users don’t read your changelog. The majority of your users only upgrade to new versions when forced to.
jacobtomlinson.dev
October 26, 2025 at 2:21 PM
Reposted
The simplex algorithm is super efficient. 80 years of experience says it runs in linear time. Nobody can explain _why_ it is so fast.

We invented a new algorithm analysis framework to find out.
Beyond Smoothed Analysis: Analyzing the Simplex Method by the Book
Narrowing the gap between theory and practice is a longstanding goal of the algorithm analysis community. To further progress our understanding of how algorithms work in practice, we propose a new alg...
arxiv.org
October 27, 2025 at 1:43 AM
Reposted
The University of Kentucky Libraries has shared my research life cycle diagram and all of my planning checklists in their resources. 🥲🧡

libguides.uky.edu/research_dat...
October 27, 2025 at 1:43 AM
Data Science Weekly - Issue 622, by @DataSciNews open.substack.com/pub/datascie...
Data Science Weekly - Issue 622
Curated news, articles and jobs related to Data Science, AI, & Machine Learning
open.substack.com
October 23, 2025 at 11:10 PM
Reposted
You’ve been saying you’ll learn #Bayesian modeling someday. Make someday 𝐭𝐨𝐝𝐚𝐲.

The next 𝐀𝐩𝐩𝐥𝐢𝐞𝐝 𝐁𝐚𝐲𝐞𝐬𝐢𝐚𝐧 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐖𝐨𝐫𝐤𝐬𝐡𝐨𝐩 from #PyMC Labs begins October 6 and registration closes this week.

Secure your spot now: dub.link/yhoQ7tF
October 2, 2025 at 1:03 PM
Reposted
Instrumental variables with PyMC.
You should also take a look at the brautiful blog post from @juanitorduz.bsky.social where the model is taken, I just adapted it to the well known cigarettes sales.

thestippe.github.io/statistics/i...
Instrumental variable regression
Making causal inference without randomization
thestippe.github.io
October 22, 2025 at 8:18 AM
Reposted
🌍 Blog series: Spatial Machine Learning with R

From caret to tidymodels, mlr3, and specialized spatial ML packages — explore how spatial context changes the way we build ML models in R.

Start with Part 1 👉 geocompx.org/post/2025/sm...

#RStats #SpatialML #MachineLearning #RSpatial
October 22, 2025 at 1:05 PM
Reposted
Bayesian Data Analysis Primers/Tutorials

I gathered these primers for learning bayesian data analysis, mainly for myself but I hope they are helpful to you too.

If you know of similar articles, do share them in the comments.

#bayesiananalysis #datascience #machinelearning #rstats #python
October 20, 2025 at 11:51 AM