Lightnews — Scholar-powered news

Reposted by Paulius Alaburda

Rhian Davies

@statsrhian.bsky.social

I gave a talk on #QA in #GitHub: ditch the spreadsheets, embed QA in your workflow, make it a habit.

More details ⬇️

🛝 slides: statsrhian.github.io/qa-in-github
📝 blog post: rhian.rbind.io/posts/2025-1...

#rstats #rap #quality-assurance

Ditch the Spreadsheets

statsrhian.github.io

November 19, 2025 at 4:55 PM

Reposted by Paulius Alaburda

Gautam Kamath

@gautamkamath.com

Thoughtful (as always) blog post from Nicholas Carlini. "Are large language models worth it?" A nice read giving his perspective on risks of ML models.

Post: nicholas.carlini.com/writing/2025...

For people who prefer, this is the video of the talk from @colmweb.org www.youtube.com/watch?v=PngH...

November 19, 2025 at 4:56 PM

Reposted by Paulius Alaburda

Neil Renic

@ncrenic.bsky.social

Methodology

November 18, 2025 at 9:47 AM

Reposted by Paulius Alaburda

Kent Beck

@kentbeck.com

Why does feature development slow & what can we do about it? (This problem because acute with vibe coding.) tidyfirst.substack.com/p/why-does-d...

tl;dr use the time between features to create options. What? You have no time between features? We'll talk about that later...

Why Does Development Slow?

It's the options

tidyfirst.substack.com

November 19, 2025 at 4:01 PM

Reposted by Paulius Alaburda

Voilà: Francis Gagnon

@chezvoila.com

😂

Dicing an Onion, the Mathematically Optimal Way

There is more than one way to dice an onion…

pudding.cool

November 19, 2025 at 4:11 PM

Reposted by Paulius Alaburda

tj mahr 🎃

@tjmahr.com

it never ceases to amaze me that I can refer to duckdb functions in R expressions as though they were R functions and everything gets translated to SQL

dplyr code with duckdb functions in all caps

tbl_files <- tbl(db, "csvs") |>
# CAPS for duckdb functions
mutate(
# $speaker has directory names with the patterns:
# - `[speaker]-[listener]` (listener specific data)
# - `[speaker]` (default version)
# so compute appropriate $speaker, $listener values
dir_parts = speaker |> STRING_SPLIT("-"),
speaker = dir_parts |> LIST_FIRST(),
listener = dir_parts |> LIST_LAST(),
listener = ifelse(listener == speaker, "default", listener)
) |>
select(speaker, listener, file, label, logprob, model_name, model_commit)

November 19, 2025 at 7:47 PM

Reposted by Paulius Alaburda

Andrew Heiss

@andrew.heiss.phd

I recently discovered Conventional Comments (conventionalcomments.org) for providing a pseudo-standard set of labels for feedback and just tried it for an article review and it was really helpful to specify issues vs. thoughts vs. suggestions, etc. Hopefully it's helpful for the authors too!

We strongly suggest using the following labels:

praise: Praises highlight something positive. Try to leave at least one of these comments per review. Do not leave false praise (which can actually be damaging). Do look for something to sincerely praise.
nitpick: Nitpicks are trivial preference-based requests. These should be non-blocking by nature.
suggestion: Suggestions propose improvements to the current subject. It’s important to be explicit and clear on what is being suggested and why it is an improvement. Consider using patches and the blocking or non-blocking decorations to further communicate your intent.
issue: Issues highlight specific problems with the subject under review. These problems can be user-facing or behind the scenes. It is strongly recommended to pair this comment with a suggestion. If you are not sure if a problem exists or not, consider leaving a question.
todo: TODO’s are small, trivial, but necessary changes. Distinguishing todo comments from issues: or suggestions: helps direct the reader’s attention to comments requiring more involvement.
question: Questions are appropriate if you have a potential concern but are not quite sure if it’s relevant or not. Asking the author for clarification or investigation can lead to a quick resolution.
thought: Thoughts represent an idea that popped up from reviewing. These comments are non-blocking by nature, but they are extremely valuable and can lead to more focused initiatives and mentoring opportunities.
chore: Chores are simple tasks that must be done before the subject can be “officially” accepted. Usually, these comments reference some common process. Try to leave a link to the process description so that the reader knows how to resolve the chore.
note: Notes are always non-blocking and simply highlight something the reader should take note of.

November 17, 2025 at 3:52 PM

Reposted by Paulius Alaburda

Kyle Walker

@kylewalker.bsky.social

For Day 16 of #30DayMapChallenge: Cell, use mapgl's `turf_voronoi()` to create Voronoi "cells" from input points.

Even better - make your Voronoi polygons interactive and dynamic in Shiny!

#rstats #GIS

November 16, 2025 at 11:08 PM

Reposted by Paulius Alaburda

Darren Dahly

@statsepi.bsky.social

For people trying to teach themselves more about statistics, go read about these different approaches and try to make sense of why they don't exactly agree. What are they doing differently? Use wikipedia. Look up new terms along the way.

Robert (Bob) Kubinec @rmkubinec.bsky.social · 4d

My #rstats cheat code for today is the binom.confint function in the binom package that will spit out *12* different ways of calculating a CI for a proportion.

Also, this is why you use R for statistics...

(and of course the correct CI method is bayes 😎)

shows all the ways you can get a CI for a proportion in R

November 16, 2025 at 10:41 AM

Reposted by Paulius Alaburda

Dan Quintana

@dsquintana.bsky.social

Our paper on improving statistical reporting in psychology is now online 🎉

As a part of this paper, we also created the Transparent Statistical Reporting in Psychology checklist, which researchers can use to improve their statistical reporting practices

www.nature.com/articles/s44...

Transparent and comprehensive statistical reporting is critical for ensuring the credibility, reproducibility, and interpretability of psychological research. This paper offers a structured set of guidelines for reporting statistical analyses in quantitative psychology, emphasizing clarity at both the planning and results stages. Drawing on established recommendations and emerging best practices, we outline key decisions related to hypothesis formulation, sample size justification, preregistration, outlier and missing data handling, statistical model specification, and the interpretation of inferential outcomes. We address considerations across frequentist and Bayesian frameworks and fixed as well as sequential research designs, including guidance on effect size reporting, equivalence testing, and the appropriate treatment of null results. To facilitate implementation of these recommendations, we provide the Transparent Statistical Reporting in Psychology (TSRP) Checklist that researchers can use to systematically evaluate and improve their statistical reporting practices (https://osf.io/t2zpq/). In addition, we provide a curated list of freely available tools, packages, and functions that researchers can use to implement transparent reporting practices in their own analyses to bridge the gap between theory and practice. To illustrate the practical application of these principles, we provide a side-by-side comparison of insufficient versus best-practice reporting using a hypothetical cognitive psychology study. By adopting transparent reporting standards, researchers can improve the robustness of individual studies and facilitate cumulative scientific progress through more reliable meta-analyses and research syntheses.

November 14, 2025 at 8:43 PM

Reposted by Paulius Alaburda

Erik van Zwet

@erik-van-zwet.bsky.social

I wrote something about publication bias at statmodeling.stat.columbia.edu/2025/11/14/t...

November 14, 2025 at 9:17 PM

Reposted by Paulius Alaburda

Hadley Wickham

@hadley.nz

testthat 3.3.0 out now! This is a massive release with tons of improvements including better failure messages, new expectations, improved snapshotting, new vignettes, and much much more: tidyverse.org/blog/2025/11... Post includes some thoughts on developing an #rstats package with Claude Code.

testthat 3.3.0

testthat 3.3.0 brings improved expectations with better error messages, new expectations for common testing patterns, and lifecycle changes including the removal of `local_mock()` and `with_mock()`. I...

tidyverse.org

November 13, 2025 at 5:24 PM

Reposted by Paulius Alaburda

Oded Rechavi

@odedrechavi.bsky.social

Adding citations of people who might review the paper

November 14, 2025 at 10:06 AM

Reposted by Paulius Alaburda

Sabrina Norwood

@sabrinanorwood.bsky.social

Pretty cool is an understatement.

There is 1.5 million hours of video game play recorded, via telemetry data! This is a very cool study🎮

Tamas Andrei Foldes @sinandrei.bsky.social · 6d

We released a pretty cool dataset/preprint today looking at video game play, cognition, time-use and a ton of self-reported psych measures at osf.io/preprints/ps... with @nballou.bsky.social @matti.vuorre.com @thomashakman.bsky.social @rpsychologist.com and @shuhbillskee.bsky.social RRs coming soon

Survey administration schedule across the
12-week study period. Participants completed biweekly
surveys (orange) every two weeks, with US participants
additionally completing daily surveys (blue) for the first
30 days. Cognitive tests (green) were administered during
biweekly surveys at weeks 1, 5, and 9. Gray circles indicate
days with no scheduled surveys. Retention percentages
show the proportion of baseline participants (N=1978)
who were still active at each measurement week (defined
as having completed either a daily or biweekly survey at
any time after that week)

Diurnal play across Xbox, Nintendo, Steam.

Sample of daily gaming patterns and mental wellbeing for three representative participants. Stacked bars
represent total daily playtime across platforms. Orange line shows biweekly mental wellbeing scores (short WEMWBS)
measured at six study waves. Participants were selected from those closest to the 25th, 50th, and 75th percentiles of total
playtime, prioritizing those with the most varied multi-platform gaming behavior. Participant IDs: p9009984081 (25th
percentile), p8809196928 (50th percentile), p7162729307 (75th percentile)

November 14, 2025 at 5:02 PM

Reposted by Paulius Alaburda

Chelsea Parlett

@chelseaparlett.bsky.social

New Data Scientists when they find out their job is mostly dashboarding and data engineering

Michael Scott from the office saying “I love math, would love to do it someday”

November 14, 2025 at 8:25 PM

Reposted by Paulius Alaburda

Isabella Velásquez

@ivelasq3.bsky.social

I wrote a lil post on the amazing work that
@ginareynolds.bsky.social does championing ggplot2 extension developers and teaching others to build their own!

The post features the Scrollytelling Quarto extension and the group's cute #RStats hex 🐱:

rworks.dev/posts/ggplot...

An Introduction to Writing Your Own ggplot2 Geoms – R Works

The ggextenders club provides inspiration and resources for those venturing into the exciting world of creating custom ggplot2 extensions.

rworks.dev

November 3, 2025 at 3:22 PM

Reposted by Paulius Alaburda

Russ Poldrack

@russpoldrack.org

Project structure for scientific coding projects
- the latest in my Better Code, Better Science series open.substack.com/pub/russpold...

Project structure for scientific coding projects

Better Code, Better Science: Chapter 6, Part 3

open.substack.com

November 11, 2025 at 2:51 PM

Reposted by Paulius Alaburda

Kristina Muise, MSc 🦇👩‍🔬

@kristinamuise.bsky.social

Any #rstats people on here have experience with hierarchal generalized additive models (HGAM) in ecology?

I’m in need of some help in possibly using one with some data!

🦇🌎🧪🧫

November 11, 2025 at 6:20 PM

Reposted by Paulius Alaburda

Chelsea Parlett

@chelseaparlett.bsky.social

🤏🤏🤏

The Drake hotline bling meme. Drake rejects “customer segmentation“ and accepts “clustomers”

November 10, 2025 at 9:14 PM

Reposted by Paulius Alaburda

Q McCallum

@qethanm.bsky.social

It's here! I've just released my latest book, "Twin Wolves: Balancing risk and reward to make the most of AI."

This is a tight, executive-level read on how to approach AI (both ML/AI and genAI) in your company.

twinwolvesai.com

#dataBS

Twin Wolves AI

Balancing risk and reward to make the most of AI

TwinWolvesAI.com

November 7, 2025 at 2:45 PM

Reposted by Paulius Alaburda

Davis Vaughan

@davisvaughan.bsky.social

We are looking for #rstats community feedback on 3 new dplyr functions!

We're aiming to expand the `filter()` family:

- `filter()` to keep rows
- `filter_out()` to drop rows
- `when_any()` and `when_all()` as modifiers

Read more and leave feedback here:
github.com/tidyverse/ti...

Example of using `filter_out()` on the `penguins` dataset, showing how it is much easier than `filter()`, especially with `NA`s

November 7, 2025 at 4:03 PM

Reposted by Paulius Alaburda

Jonathan

@jmcphers.bsky.social

My keynote about data science tools at posit::conf is now online! I originally meant it to be a talk about Positron, but as I was writing it, it took a left turn through the history of RStudio and into the philosophy of tool design & how to build stuff for people.

www.youtube.com/watch?v=tGre...

10 Years of Data Science Tools...and What Happens Next (Jonathan McPherson) | posit::conf(2025)

YouTube video by Posit PBC

www.youtube.com

November 7, 2025 at 6:11 PM

Reposted by Paulius Alaburda

alex hayes

@alexpghayes.com

What are your favorite books/articles/resources about graphic design for academic posters and presentations?

November 4, 2025 at 1:44 AM

Reposted by Paulius Alaburda

Kevlin Henney

@kevlin.bsky.social

On the blog: Think for Yourself

"By skimming past the friction necessary for learning, the pursuit of convenience can end up deskilling rather than enhancing skills."

kevlinhenney.medium.com/think-for-yo...