#rstats{datasets
This article shows how duckplyr can be used instead of dplyr, providing very fast results with very huge datasets. #rstats #programming
tidyverse.org/blog/2025/06...
duckplyr fully joins the tidyverse!
duckplyr 1.1.0 is on CRAN! A drop-in replacement for dplyr, powered by DuckDB for speed. It is the most dplyr-like of dplyr backends.
tidyverse.org
November 11, 2025 at 3:55 PM
A plethora of datasets at your fingertips Part3: how many times do couples cheat on each other?

https://thierrymoudiki.github.io/blog/2024/02/19/python/quasirandomizednn/explainableml/nnetsauce-dl-data

#Techtonique #DataScience #Python #rstats #MachineLearning
November 8, 2025 at 8:25 PM
Developed a new #RStats package to access sub-national boundary spatial data (UN OCHA COD and geoBoundaries datasets) from the excellent fieldmaps project. Using #DuckDB and #geoarrow to efficiently query remote parquet and convert to sf #RSpatial #GIS

github.com/epicentre-ms...
GitHub - epicentre-msf/fieldmaps: R Interface to Fieldmaps Data
R Interface to Fieldmaps Data. Contribute to epicentre-msf/fieldmaps development by creating an account on GitHub.
github.com
November 4, 2025 at 2:01 PM
From the DSLC.io aRchives:

🔵 DSLC Project Club: Contributing Datasets to TidyTuesday youtu.be/rmfpClCFfRs

Visit dslc.video for hours of new #DataScience videos every week!

#dataBS #RShiny #dataScience #RStats #shiny
DSLC Project Club: Contributing Datasets to TidyTuesday (proj01)
YouTube video by Data Science Learning Community Videos
youtu.be
October 30, 2025 at 11:38 AM
A plethora of datasets at your fingertips Part3: how many times do couples cheat on each other?

https://thierrymoudiki.github.io/blog/2024/02/19/python/quasirandomizednn/explainableml/nnetsauce-dl-data

#Techtonique #DataScience #Python #rstats #MachineLearning
October 29, 2025 at 5:35 PM
As usual, a big thanks to @12xpert.bsky.social for gathering the data, posit.cloud for hosting my #Rstats simulations and @kaggle.com for hosting the resulting datasets.
October 28, 2025 at 1:44 PM
👨💻 Started to work with big data and did some benchmarks with medium datasets (400k–3.5M rows, up to 490 MB).
📊 tl;dr → polars (lazy) is crazy efficient; duckplyr is a solid R-native alternative.
Thinking of pushing this further with larger datasets 💡#DataScience #RStats #Python
October 28, 2025 at 12:04 PM
If anybody has any good tips/resources for a statistical programmer who has a technical interview as a data scientist, let me know. I've not done one before

(Background is epidemiology and clinical trials but this would be in health data science looking at massive datasets)

#rstats #datascience
October 23, 2025 at 8:06 PM
A plethora of datasets at your fingertips Part3: how many times do couples cheat on each other?

https://thierrymoudiki.github.io/blog/2024/02/19/python/quasirandomizednn/explainableml/nnetsauce-dl-data

#Techtonique #DataScience #Python #rstats #MachineLearning
October 21, 2025 at 2:32 PM
#statstab #438 Introduction to {DataSetsVerse}

Thoughts: It's full teaching season, so here is a package with datasets to use. Covers Crim, Econ, Med and more.

#r #rstats #education #pedagogy #teaching #opendata #OpenScience #data #datasets

lightbluetitan.github.io/datasetsverse/
A Metapackage for Thematic and Domain-Specific Datasets in R
The DataSetsVerse is a metapackage that brings together a curated collection of R packages containing domain-specific datasets. It includes time series data, educational metrics, crime records, medica...
lightbluetitan.github.io
October 15, 2025 at 8:05 PM
I loved the community aspect of it. Also felt like a great way to learn out loud especially with #TidyTuesday (which is still a great project). I searched my phone for #rstats and forgot one of my tweets made it into a drob presentation haha
October 15, 2025 at 4:27 AM
If you use full_join() to bring together two datasets and then plan to compare numeric columns between two datasets, make sure to use replace_na() to replace missing values with zero otherwise you will miss where there are blanks.

#rstats
October 9, 2025 at 10:55 AM
#tidytuesday | 2025-09-23 | FIDE ♟️
If I want to join both datasets, which column should I use? No full duplicates but partial duplicates in id & name.

(I used inner join on both "id", and "name"). But why isn’t id unique here? Isn’t there supposed to be a unique identifier for each player?
#rstats
October 4, 2025 at 12:12 PM
Course "Marine Spatial Data Analysis in R"

Dates: Online, 24-27 Nov

This course is designed to help marine scientists and conservation practitioners develop practical skills for working with marine spatial datasets in R.

www.physalia-courses.org/courses-work...

#Marine #SpatialData #Rstats
Marine Spatial Data Analysis in R
24–27 November 2025
www.physalia-courses.org
October 2, 2025 at 4:28 PM
Updates are underway in the {stats19} #rstats package, the quickest way to access large, high-quality collision datasets with geographical location and dozens of other variables at collision, vehicle, and casualty levels: new contributor (Blaise 🔥) + new data almost ready to launch 🚀
October 2, 2025 at 1:42 PM
🌊 New @Oceanteacher course with Physalia!
Marine Spatial Data Analysis in R
Learn to process, analyze & map marine datasets to support research & conservation.

📅 Online | 24–27 Nov 2025
#MarineScience #Rstats
October 2, 2025 at 8:30 AM
Uppdaterade simuleringar av slutet på @allsvenskanse.bsky.social 2025 efter 25 omgångar.
As usual a big thx to @12xpert.bsky.social for collecting the data, posit.cloud for hosting my #rstats simulations and kaggle.com for storing the datasets.🧵
September 30, 2025 at 1:07 PM
This week on What's New in R, we're featuring:
✅ A migration guide from RStudio to Positron
✅ A tutorial by Charles Minshew on using R to dig for story ideas
✅ The SouthKoreAPIs package by Renzo Caceres Rossi for accessing South Korean datasets

Read the issue: buff.ly/ILS5kAs

#rstats
September 29, 2025 at 3:04 PM
Starting October 1! I have been a busy beaver curating spooky datasets!

#RStats #ggplot2 #DataViz #R #Tidyverse
September 26, 2025 at 4:02 AM
A plethora of datasets at your fingertips Part3: how many times do couples cheat on each other?

https://thierrymoudiki.github.io/blog/2024/02/19/python/quasirandomizednn/explainableml/nnetsauce-dl-data

#Techtonique #DataScience #Python #rstats #MachineLearning
September 25, 2025 at 3:12 PM
Looks like Apache Sedona has been democratised for smaller datasets and local analyses. Looking forward to making my #rstats code go brrrr 🔥

sedona.apache.org/latest/blog/...
Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen - Apache Sedona
Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of...
sedona.apache.org
September 24, 2025 at 8:42 PM
Upcoming R Consortium webinar:

👉 How to Use pointblank to Understand, Validate, and Document Your Data

🔗 r-consortium.org/webinars/how...

Learn how pointblank helps you explore datasets, validate tables, and document variables.

#rstats #opensource
September 15, 2025 at 7:25 PM
Equivalence between pairwise correlation and VIF in multicollinearity filtering.

Experiment:

- Subset df (30k rows, 249 cols) to random dimensions.
- Filter using a random max correlation.
- Find VIF producing the most similar result to the step above.
- Repeat 10k times.

#rstats 📦 {collinear}
September 15, 2025 at 11:52 AM
1/2
Hivemind thoughts about indexing #rstats functions in an book?

In my forthcoming book, I've arranged to automatically index all R {packages} and datasets.

I use text like `datasets::iris`, and index both under their names, and under a heading like "Packages".
⬇️
September 12, 2025 at 1:14 AM
The Art of Data Visualization with ggplot2: A free online book by @nrennie.bsky.social that guides us through the entire process of creating plots, including why certain decisions were made, using real datasets that have been part of #TidyTuesday. Very excited to get stuck into this one. #rstats
The Art of Data Visualization with ggplot2
The TidyTuesday Cookbook
nrennie.rbind.io
September 11, 2025 at 8:41 PM