Pete Bachant
banner
petebachant.me
Pete Bachant
@petebachant.me
Bicycles, fluid dynamics, Python, open source, open science, reproducibility. https://petebachant.me | https://calkit.org
The Calkit Run GitHub Action now authenticates with OIDC tokens, so no secrets are required to push artifacts, e.g., the latest PDF of your paper, up to the cloud: github.com/calkit/run-a...

#automation #openscience #reproducibility
Release v2.0.0 · calkit/run-action
With this version we automatically fetch a DVC token from calkit.io using GitHub OIDC. Full Changelog: v1...v2.0.0
github.com
November 8, 2025 at 3:18 PM
💥 First "real world" Calkit repro pack just dropped!

In this paper we did a bunch of benchmarking for a brand new astronomical alert brokering system designed to interface with the Rubin Observatory.

Check out the repo here: github.com/boom-astro/b...

#openscience #reproducibility #opensource
GitHub - boom-astro/boom-paper: The first paper about BOOM development.
The first paper about BOOM development. Contribute to boom-astro/boom-paper development by creating an account on GitHub.
github.com
November 7, 2025 at 5:09 PM
Reposted by Pete Bachant
Really love this kind of reality-check meta-research 👉“The struggle to make transparency mainstream: initial evidence for a slow uptake of open science practices in PhD theses”

royalsocietypublishing.org/doi/full/10....
November 5, 2025 at 3:05 AM
If you publish a "repro pack" with your paper, you're awesome, but there's about a 10% chance it will actually run on someone else's computer. In this post I explain why that isn't your fault, why it matters, and what we should do about it: petebachant.me/single-button

#openscience #reproducibility
Single-button reproducibility: The what, the why, and the how
petebachant.me
October 17, 2025 at 2:05 PM
Calkit projects can now incorporate Julia Jupyter notebooks into their pipelines: calkit.io/calkit/examp...

#julialang #reproducibility #openscience
Calkit
calkit.io
October 17, 2025 at 2:46 AM
Why number your notebooks/scripts and execute them manually when you could simply put them into a pipeline that automatically manages their environments and caches their outputs?

docs.calkit.org/pipeline/

#datascience #automation
The pipeline - Calkit
docs.calkit.org
October 14, 2025 at 3:01 PM
1. Generate evidence to support some claims
2. Don't automate the creation of said evidence

Congratulations, you've just contributed to the reproducibility crisis!

#reproducibility #openscience
October 8, 2025 at 8:29 PM
Reposted by Pete Bachant
So much brilliant work never makes it into a paper.
The code, the data, the long nights helping others debug.
At pyOpenSci, we believe that code, data, and community are the pulse.
Research advances quickly when we build together & openly.
Join us. 💛 bit.ly/pyos-volunteer
#openscience #opensource
Get involved with pyOpenSci
pyOpenSci’s Website
bit.ly
October 8, 2025 at 5:20 PM
Don't be ashamed of "messy" code. If it works, it's good. Share it.

#openscience #reproducibility
October 5, 2025 at 3:18 PM
Reading through some slides from 2013 titled "how to succeed in reproducible research without really trying". It's true we have all the tools needed for researchers to build their own reproducible workflows, but still many do not. Maybe the tools are still too hard to learn and use!
October 3, 2025 at 11:47 AM
Programming tip: Name classes after the data they encapsulate, not the actions they perform on that data. For example, instead of SchemaProcessor, just call it Schema:

processed_schema = Schema().process()

#programming #oop #softwareengineering
September 28, 2025 at 9:01 AM
Hot take: Notebooks are fine in production as long as they're part of a reproducible pipeline

docs.calkit.org/notebooks/

#reproducibility #datascience #openscience
Notebooks - Calkit
docs.calkit.org
September 26, 2025 at 10:01 AM
Please don't number your scripts. Refer back to (2) and use a pipeline (like Calkit's of course)!

www.nature.com/articles/d41...

#reproducibility #automation #openscience
It’s a new term: here are 99 lab hacks
Nature asked contributors, editors and working researchers to share their best advice for scientists.
www.nature.com
September 26, 2025 at 8:37 AM
Reposted by Pete Bachant
In a newly released arXiv preprint, we explore how open science practice like sharing data, code and preprints relate to citation impact in French-authored research over a 3-year period.

Thanks to @ouvrirlascience.bsky.social for highlighting its national importance.

🔗 Read more: plos.io/3Vmykrj
September 16, 2025 at 4:53 PM
Reproducibility tip: Any figure, dataset, ML model, etc., should not be shared until it is produced with an automated, version-controlled pipeline.

#reproducibility #openscience
September 16, 2025 at 2:28 PM
While profiling some CUDA code on a SLURM cluster I realized I was not working in a very reproducible way, which could become a problem down the road if I ever needed to know how a certain result was generated, so Calkit now has SLURM integration: docs.calkit.org/pipeline/slu...
SLURM integration - Calkit
docs.calkit.org
September 15, 2025 at 2:55 PM
Julia should have an option that automatically does the same thing as:

export JULIA_LOAD_PATH=@:@stdlib
julia --project=. -e 'using Pkg; Pkg.instantiate()'

before running any command.

Maybe an enhanced reproducibility mode option, like --repro?

#julialang #reproducibility
September 14, 2025 at 3:35 PM
I don't know why, but I always found it hard to remember the process for adding an SSH key to GitHub, so I made a wizard for it:

calkit config github-ssh

(might be buggy, but still an improvement over manually running commands from the docs)
September 12, 2025 at 3:23 PM
Moving pieces of code farther apart from each other (into different packages, modules, repos) doesn't guarantee you've decoupled them. In fact, if you haven't, you probably just made your life a whole lot harder.

#softwareengineering
September 10, 2025 at 2:33 PM
If you're a leader of knowledge workers you should be giving teams fewer, vaguer goals. Handing out well-defined projects and tasks to individuals is a waste of their abilities.
September 8, 2025 at 2:34 PM
How much "waste" do you have in your scientific workflow? For example, do you manually rerun plotting scripts/notebooks after updating processing logic? Do you then manually re-upload these figures to Overleaf? Want to automate this stuff away? Reach out and I will help!
September 8, 2025 at 2:20 PM
Code, data, config files, etc. all must be shared in order to describe computational methods with sufficient detail.

#reproducibility #openscience
September 4, 2025 at 7:13 PM
Anyone have any good references that examine the relationship between computational reproducibility and time to publication? I'd assume more automated, reproducible workflows will help studies get through peer review more quickly.

#openscience #reproducibility
August 27, 2025 at 3:07 PM
Calkit now has its own GitHub Action to run your project's pipeline and optionally commit and push results: docs.calkit.org/tutorials/gi...
Running Calkit in GitHub Actions - Calkit
docs.calkit.org
August 21, 2025 at 2:46 PM