Crystal Lewis
banner
cghlewis.bsky.social
Crystal Lewis
@cghlewis.bsky.social
Research Data Management Consultant | cghlewis.com

Co-organizer @r-ladies-stl.bsky.social‬
Co-organizer POWER Data Management Hub | https://osf.io/ap3tk/

Author of DMLSER: https://datamgmtinedresearch.com/
RDM Weekly: https://rdmweekly.substack.com/
Pinned
Re-introduction for new followers!
Hello! 👋
I am currently a freelance research data management consultant. I also co-organize R-Ladies St. Louis. I mostly post about data management and #rstats data wrangling tips. I also recently wrote this book.
datamgmtinedresearch.com
Welcome | Data Management in Large-Scale Education Research
This is the in-progress version of Data Management in Large-Scale Education Research.
datamgmtinedresearch.com
Reposted by Crystal Lewis
Do you all have a need to encrypt files you generate before sending? I’ve been working with honeycakefiles.com to protect files. Just seeing if there might be interest in an r package to use it to protect your files if I can get one put together (with permission of the company)?

#rstats
Honeycake - Enterprise File Security. For Everyone.
honeycakefiles.com
February 5, 2026 at 8:31 PM
Thank you so much to Theresa Stankov (Data Manager at @cos.io) for sharing Tools for Reproducible Data Pipelines with our Data Management Hub today!

osf.io/ap3tk/files/...
February 5, 2026 at 6:17 PM
Reposted by Crystal Lewis
dplyr 1.2.0 is out now and we are SO excited!

- `filter_out()` for dropping rows

- `recode_values()`, `replace_values()`, and `replace_when()` that join `case_when()` as a complete family of recoding/replacing tools

These are huge quality of life wins for #rstats!

tidyverse.org/blog/2026/02...
dplyr 1.2.0
dplyr 1.2.0 fills in some important gaps in dplyr's API: we've added a new complement to `filter()` focused on dropping rows, and we've expanded the `case_when()` family with three new recoding and re...
tidyverse.org
February 4, 2026 at 11:39 AM
Reposted by Crystal Lewis
Can anyone suggest a good guides for using lab notebooks? (Like the traditional pen and paper bench scientist ones, not computational). Like I want to know what you're expected to write down, how you capture and transform the really important stuff, how you refer back to them etc etc
February 3, 2026 at 9:21 PM
"If a dataset is confusing to a human analyst, it will be opaque to a language model. If the logic behind a metric lives only in someone’s head, an AI agent has no way to recover it. In that sense, AI does not lower the bar for data quality—it raises it."
February 3, 2026 at 6:31 PM
Issue 31 of RDM Weekly is out! 📬

➡️ OS resources in Comm Sciences @csdisseminate.bsky.social
➡️ RDA 25th Plenary Meeting Programme @researchdataall.bsky.social
➡️ Ask for R Help by Creating a Small Reproducible Example @libdrstats.bsky.social
and more!

rdmweekly.substack.com/p/rdm-weekly...
RDM Weekly - Issue 031
A weekly roundup of Research Data Management resources.
rdmweekly.substack.com
February 3, 2026 at 2:05 PM
Reposted by Crystal Lewis
Good news: The posit::conf(2026) Call for Talks has been extended to Friday, February 20!
Join us in Houston to share your work with the R & Python community.
🎤 Speakers receive: Professional coaching, free conference pass, travel assistance
Submit your 20-min talk proposal: pos.it/conf-talk-2026
February 2, 2026 at 4:52 PM
Reposted by Crystal Lewis
Hi folks! For a methods validation paper, we're looking for a large, cross-sectional, psychometric dataset that will show considerable heterogeneity in item responses, preferably, based on different classes/clusters/groups of people. Open access optimal, but if you have one you'd share w me ...
February 2, 2026 at 6:28 PM
Reposted by Crystal Lewis
Okay #StatsSky, used simulateData in lavaan to generate a dataset for a class. Included a seed in the code but students get one of two datasets: (1) the same as me or (2) a second dataset that was different but is identical to everyone else in the class that didn't end up with dataset #1.

1/2
a man in a suit and tie is saying help help
ALT: a man in a suit and tie is saying help help
media.tenor.com
January 15, 2026 at 5:58 PM
Reposted by Crystal Lewis
RDA-US is launching a funded program for US-based professionals working in/with research infrastructure. Great professional development and networking opportunity...and great way for newcomers to engage with the Research Data Alliance (RDA)! rda-us.org/announcing-t...
Announcing the RDA-US 2026 Cohort: Apply Now – RDA-US
rda-us.org
January 30, 2026 at 12:51 AM
I've just accepted my first Keynote Speaking engagement and I'm really excited.

More details coming soon....
a cartoon character with glasses says i 'm soooo excited !!
ALT: a cartoon character with glasses says i 'm soooo excited !!
media.tenor.com
January 29, 2026 at 10:42 PM
Reposted by Crystal Lewis
The thing I admire about @emilyriederer.bsky.social is that she takes tacit knowledge about so many aspects of data science, statistics, and data engineering and makes it explicit.

Really great podcast with her on the test set!

overcast.fm/+ABQHd_PtRq4

#dataBS
Emily Riederer: Column selectors, data quality, and learning in public — The Test Set by Posit
Emily Riederer writes Python with an R accent, and we’re all comfortable with it. In this episode, Emily reflects on her journey through R, Python, and SQL — from lessons learned in averaging default ...
overcast.fm
January 27, 2026 at 2:50 AM
Issue 30 of RDM Weekly is out! 📬

➡️Mapping Public OA K-12 State Education Data @alexjbowers.bsky.social
➡️ A New Tool for Measuring Metadata Completeness
➡️ Ten Simple Rules on How to Write a Standard Operating Procedure
➡️ Please Switch to Python
and more!

rdmweekly.substack.com/p/rdm-weekly...
RDM Weekly - Issue 030
A weekly roundup of Research Data Management resources.
rdmweekly.substack.com
January 27, 2026 at 2:42 PM
It's not unheard of to find errors in your data after publishing it. While it's not fun when this happens, this one-pager can help guide you through the process of updating data, code, and publications when errors are found.
osf.io/q4jre/files/...
January 26, 2026 at 4:45 PM
Reposted by Crystal Lewis
Join the Qualitative Scholars Community Group on January 30 at 1 PM ET as they walk through Taguette, a new, free, and open-source qualitative coding software. Register today: aefpweb.org/ev_calen...
January 23, 2026 at 4:11 PM
Data Management in Large-Scale Education Research is somehow #111 in Statistics Books today. 😍
www.amazon.com/Data-Managem...
January 22, 2026 at 8:25 PM
Reposted by Crystal Lewis
Curious about what people said in response to the RFI about "re-imagining" the Institute of Education Sciences (IES)? I put the public comments in one file (763 pages) and used AI to analyze the themes and areas of agreement and disagreement (5 pages) by audience ⬇️: docs.google.com/document/d/1...
Loading Google Docs
Web word processing, presentations and spreadsheets
docs.google.com
January 22, 2026 at 1:32 PM
Reposted by Crystal Lewis
posit::conf(2026) call for talks is now open! If you're an #RStats or #Python user, have a great DS workflow to share, or have some lessons learned, we'd love to hear from you.

🔗 posit.co/blog/posit-c...
Posit::conf(2026) Call for Talks - Posit
posit::conf(2026) is coming September 14-16 to Houston, TX, and we're looking for talks!
posit.co
January 22, 2026 at 2:58 PM
January 21, 2026 at 3:52 PM
Ooooh {stringr} now includes a function for my favorite naming convention, snake_case. ☺️
ICYMI: stringr 1.6.0 🧵

This update brings performance gains and new fns for string manipulation in #RStats.

Highlights include: Faster replacements with `str_replace_all()`, 🐫 new case tools to camel, snake, and kebab, case-sensitive str_like(), & more.

Read more: tidyverse.org/blog/2025/11...
January 20, 2026 at 4:14 PM
Reposted by Crystal Lewis
For grouped summaries in #rstats, do you use group_by() + summarize() or the .by argument in summarize()?
January 20, 2026 at 2:30 PM
Reposted by Crystal Lewis
We mapped 3,822 public open-access K–12 education datasets across 7 states + D.C. using FAIR data principles and the National Academies’ 16 indicators. 🗺️📊

Findable, Accessible, Interoperable, Reusable

First cross-state FAIR mapping at this scale.

doi.org/10.7916/c0jk...

call-ecl.wceruw.org
January 20, 2026 at 2:38 PM
RDM Weekly Issue 29 is out! 📬

➡️ Data Science Resources @nrennie.bsky.social
➡️ Love Data Week Events @icpsr.bsky.social
➡️ Global Community Priorities for Agentic AI in Research @researchdataall.bsky.social
➡️ How to Make a Data Dictionary @cos.io
and more!

rdmweekly.substack.com/p/rdm-weekly...
RDM Weekly - Issue 029
A weekly roundup of Research Data Management resources.
rdmweekly.substack.com
January 20, 2026 at 2:08 PM
Running longitudinal projects with a team of people can quickly lead to confusing folder structures and file names.

Creating a style guide that guides how folders are organized and files are named, can help ensure that files are easier to find and interpret.

datamgmtinedresearch.com/style#style
January 19, 2026 at 2:16 PM