Andrew Nesbitt
banner
andrewnez.mastodon.social.ap.brid.gy
Andrew Nesbitt
@andrewnez.mastodon.social.ap.brid.gy
Working on mapping the world of open source software https://ecosyste.ms and empowering developers with https://octobox.io

Building a track focus Subaru BRZ […]

[bridged from https://mastodon.social/@andrewnez on the fediverse by https://fed.brid.gy/ ]
Reposted by Andrew Nesbitt
Package Managers Devroom at FOSDEM 2026: Schedule Announced: https://blog.ecosyste.ms/2025/12/20/fosdem-2026-package-managers-devroom-schedule.html
Package Managers Devroom at FOSDEM 2026: Schedule Announced
Wolf Vollprecht and Andrew Nesbitt are co-organizing the Package Managers devroom at FOSDEM 2026, and the schedule is now live. We have nine talks covering supply chain security, dependency resolution, build reproducibility, and the economics of running package registries. **Saturday, 31 January 2026** Room K.3.201 (capacity 80) / 10:30-14:25 ### A phishy case study _Adam Harvey / 10:30-10:55_ Adam walks through a phishing attack that targeted owners of popular Rust crates in September 2024. The talk covers how the attack unfolded and how collaboration between the Rust Project, Rust Foundation, and Alpha-Omega helped shut it down quickly. ### Current state of attestations in programming language ecosystems _Zach Steindler / 11:00-11:25_ Zach surveys how npm, PyPI, RubyGems, and Maven Central have adopted attestations to link packages to their source code and build instructions. He’ll explain Sigstore bundle verification, compare implementation approaches across registries, and discuss what this means for ecosystems that haven’t adopted attestations yet. ### Name resolution in package management systems _Gábor Boskovits / 11:30-11:55_ Gábor examines how different package managers handle dependency resolution through the lens of reproducible builds. The talk compares language-specific lock files (Cargo), traditional distribution packaging (Debian), and declarative approaches (Nix, Guix). ### Package managers à la carte: A Formal Model of Dependency Resolution _Ryan Gibb / 12:00-12:25_ Ryan introduces the Package Calculus, a formal framework for unifying how different package managers resolve dependencies. The talk addresses three problems: multi-language projects can’t express cross-language dependencies precisely, system and hardware dependencies remain implicit, and security vulnerabilities in full dependency graphs are hard to track. ### Trust Nothing, Trace Everything: Auditing Package Builds at Scale with OSS Rebuild _Matthew Suozzo / 12:30-12:55_ Matthew argues that reproducible builds aren’t enough if you don’t understand what happens during the build itself. He presents OSS Rebuild’s open-source observability toolkit, including a transparent network proxy and an eBPF-based system analyzer for detecting suspicious build behavior. The talk responds to supply chain attacks like the XZ backdoor. ### PURL: From FOSDEM 2018 to international standard _Philippe Ombredanne / 13:00-13:10_ Philippe traces Package-URL’s journey from its FOSDEM 2018 debut to becoming an international standard for referencing packages across ecosystems. PURL now appears in CVE formats for vulnerability tracking and is used by security tools, SCA platforms, and package registries for SBOM and VEX generation. ### Binary Dependencies: Identifying the Hidden Packages We All Depend On _Vlad-Stefan Harbuz / 13:15-13:25_ Vlad tackles a gap in package management: while source dependencies are well documented, binary dependencies like numpy’s reliance on OpenBLAS binaries remain invisible. He proposes a global index of binary dependencies using a linker that tracks symbols across the open source ecosystem. ### The terrible economics of package registries and how to fix them _Michael Winser / 13:30-13:55_ Michael examines why package registries struggle financially despite being used by almost all software. Most rely on grants, donations, and in-kind resources while facing increased costs and security expectations. He discusses how the Alpha-Omega project has funded security improvements and piloted sustainable revenue models with major registries. ### Package Management Learnings from Homebrew _Mike McQuaid / 14:00-14:25_ Mike discusses Homebrew’s v5.0.0 release from November 2025, covering what other package managers could learn from Homebrew’s approach and what Homebrew has adopted from elsewhere. See you in Brussels on January 31st.
blog.ecosyste.ms
December 20, 2025 at 4:56 PM
Someone needs to make a prediction market for GitHub Stars 🤩
December 20, 2025 at 8:07 AM
Why JavaScript Needed Docker: https://nesbitt.io/2025/12/19/why-javascript-needed-docker.html

Following on from yesterdays post about Docker being the lockfile of system package managers, it was also the real lockfile of Javascript for a very long time.
Why JavaScript Needed Docker
At a Node.js conference years ago, I heard a speaker claim that npm had finally “solved” dependency hell. The ecosystem wasn’t solving dependency conflicts so much as deferring them to production. Docker’s adoption in Node.js was partly a response to this. When local builds aren’t deterministic, containers become the only way to ensure what you test is what you deploy. The Dockerfile compensated for reliability the package manager didn’t provide. Many developers have moved to pnpm or Yarn. But to understand why npm struggled with correctness for so long, look at the incentives. Every package manager tradeoff has a growth-friendly side and a correctness-friendly side. npm consistently chose growth. ### Dependency resolution Most package managers make you solve version conflicts. Bundler will error if two gems need incompatible versions of the same dependency. This is annoying, but it forces you to understand your dependency tree. npm took a different approach: just install both versions. Nest them in separate node_modules folders and let each dependency have whatever it wants. No conflicts, no errors, no friction. This was brilliant for adoption. New developers never hit “dependency hell.” Everything just worked, or appeared to. The JavaScript ecosystem exploded. In the context of 2010, this was a revelation: while other communities were struggling with manual conflict resolution, Node.js developers were shipping code. This velocity is arguably what allowed JavaScript to move from a browser-only language to a dominant server-side force. The tradeoff was bloat and fragility. A single `npm install` might pull hundreds of packages, many of them the same library at slightly different versions. node_modules became a meme. And because resolution didn’t have to be deterministic—just install everything—npm spent most of its history without the machinery to guarantee two machines got the same tree. ### Lockfiles Shrinkwrap arrived in 2012, opt-in and fragile. Few projects used it seriously. The ecosystem grew anyway. Yarn’s emergence in 2016 highlighted a growing need for deterministic builds at scale. Facebook needed reproducible builds across thousands of engineers, and Yarn had reliable lockfiles from day one. This signaled that the ecosystem’s requirements were outgrowing npm’s original design assumptions. npm responded in 2017 with package-lock.json. But even then, `npm install` updated the lockfile by default. The deterministic command, `npm ci`, was added in 2018 as a separate thing you had to know about. Reproducibility remained opt-in. npm 5’s lockfile wasn’t even deterministic in practice. Platform differences, install order, optional dependencies, and outright bugs meant two machines could generate different lockfiles from the same package.json. npm 7 in 2020 finally improved this, but by then the pattern was set: Node builds were flaky, and if you wanted reliability, you containerized. ### Docker as workaround When npm’s resolution diverged between machines, the failures showed up in production. A developer runs `npm install`, commits the lockfile, CI runs `npm install` again and gets a slightly different tree, staging gets a third variation. The bug that crashes production doesn’t reproduce locally because your node_modules isn’t the same node_modules. Docker provided a pragmatic solution. Freeze the result of `npm install` in an image, push that image, and every environment gets the same bytes. The Dockerfile became an alternative mechanism for achieving the reproducibility that lockfiles were meant to provide. This reduced the pressure on npm to change. The teams hitting reproducibility problems had already found their workaround. The teams who hadn’t hit problems yet didn’t need one. ### Incentives all the way down Every decision made sense if your goal was adoption: * Nested resolution removes friction for new users * Silent lockfile updates mean fewer confusing errors * Opt-in strictness means the default path stays smooth Strict correctness was often traded for a lower barrier to entry. And when correctness failures got bad enough to cause problems, Docker was there to provide an alternative. npm occupies a unique position as one of the few major registries managed within a corporate structure, alongside Maven Central. Most others are open source and community-governed. This has historically allowed for rapid scaling, though it inevitably influences how technical priorities are balanced. In 2024, `npm install` still mutates the lockfile by default. Fifteen years in, determinism is still opt-in. The ecosystem learned to work around it, first with Yarn, then with Docker, now with pnpm. npm made incremental improvements, but the pressure to change the fundamentals was reduced because the ecosystem kept finding its own solutions. The transition to npm 7 in 2020 represented a major architectural pivot, allowing the team to address long-standing structural constraints. Every anti-pattern I’ve documented in GitHub Actions’ package management—non-deterministic resolution, mutable versions, missing lockfiles—follows the same pattern. Until 2014, `npm publish --force` let you overwrite published versions, and it took three years before anyone decided that was a bad idea. The pressure to fix these problems was lower because workarounds existed. The same low-friction design has security implications. Sonatype’s 2024 report found that npm represents 98.5% of observed malicious packages across open source registries. The sheer volume of packages makes npm a larger target, but the trust model of the early 2010s is also being tested by the security requirements of 2025. The JavaScript ecosystem’s micro-package culture means more dependencies per project, low publishing friction makes it easy to upload packages, and install-time scripts run arbitrary code by default. Last year, npm’s creator and former CEO Isaac Schlueter, along with former npm CLI lead Darcy Clarke, started vlt to build a new JavaScript package manager. That npm’s original leadership is now building from scratch is perhaps the clearest admission that the current architecture has reached its limits. Clarke’s post on the massive hole in the npm ecosystem documents a manifest validation flaw that’s existed since npm’s inception. Package managers are nearly impossible to change once they have adoption, because millions of projects depend on existing behavior. Some of those bugs are now load-bearing.
nesbitt.io
December 19, 2025 at 8:58 PM
I've submitted three talk proposals to Pycon: https://gist.github.com/andrew/6ac703950cf66dd0eb16cca07a812db2

It occured to me that these would work great at other langauge conferences too, any other CFPs open at the moment that are interested in package security?
01-ci-supply-chain.md
GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
December 19, 2025 at 5:55 PM
A thought that's been bouncing around my head for a long time... Docker is the Lockfile for System Packages: https://nesbitt.io/2025/12/18/docker-is-the-lockfile-for-system-packages.html
Docker is the Lockfile for System Packages
Back when I worked in a large office in London, I remember a team pulling their hair out as they moved to the cloud. They were trying to autoscale, spinning up new machines and installing packages on boot. Each instance resolved dependencies against whatever apt’s mirrors had at launch time, so they’d debug a problem on one server only to find other servers had slightly different package versions. A security patch landed between instance launches, or a minor release appeared, and suddenly their servers diverged. Language package managers solved this years ago. Bundler shipped Gemfile.lock in 2010, and the basic promise is simple: commit a lockfile, and any machine running `install` gets the exact same dependency tree. Cargo and nearly every other language ecosystem has something equivalent now. System package managers never followed. apt and yum still don’t have lockfiles. You can pin versions, write `/etc/apt/preferences.d/` files, and use `versionlock` plugins, but there’s no single file capturing “this exact set of packages at these exact versions, reproducible across machines and time.” You can get determinism through internal mirrors, Debian snapshot, and careful versioning, but that’s a significant operational investment. The tools assume you want the latest compatible packages from your distribution’s current state, so you get resolution-time nondeterminism rather than a captured artifact you can share. Docker solved this almost by accident. It was selling developer experience and deployment consistency, not reproducibility. The image-as-artifact emerged from implementation choices like union filesystems and content-addressable storage rather than explicit design goals around determinism. To be precise: Docker solved deployment determinism, not build determinism. Running `docker build` twice on the same Dockerfile can produce different images due to timestamps, package manager state, and metadata. What Docker guarantees is that once you have an image, every machine running it gets identical bytes. That’s a weaker property than a true lockfile, which can be regenerated from its manifest. But it was enough. Teams didn’t need to rebuild from scratch on every deploy; they needed the thing they built to behave the same everywhere they ran it. The Dockerfile isn’t quite a lockfile. It’s more like a build script. But the resulting image acts like one, capturing the full resolved state of every system package, every library, and every binary in a form you can version, share, and deploy identically everywhere. Docker gave teams something they couldn’t get any other way: a lockfile for the operating system layer. That autoscaling team switched to Docker and their problem disappeared. They built once and every new instance was identical regardless of when it launched. The broader shift was already underway: cloud infrastructure meant treating servers as cattle, not pets. You couldn’t hand-tune each machine’s package state when you might spin up fifty instances in an hour and tear them down by morning. VM images could do this too, but at much higher cost in size, build time, and tooling. Docker made it cheap enough to be the default. The reason apt doesn’t have a lockfile is that it’s designed for systems, not applications. A system needs to be patched in place; an application needs to be immutable. **Docker effectively turned the system into an application** , and with web applications as its primary use case, immutability was exactly what people wanted. Distribution maintainers try to keep things compatible, but “compatible” and “identical” aren’t the same thing. When you need identical, the Docker image gives you that. Docker’s approach has real limitations. The Dockerfile tells apt to install packages but doesn’t record which versions it got, so rebuilding tomorrow might produce a different image. You can’t edit a Docker image after the fact the way you’d edit a lockfile to bump one dependency. Updating one system package invalidates the whole layer and forces reinstallation of everything in that layer. There’s a security tension too: freezing system packages means inheriting whatever vulnerabilities existed at build time. Tools like apko from Chainguard take this seriously, producing bitwise-reproducible images by design through declarative configs rather than imperative Dockerfiles. Nix and Guix prove that system-level lockfiles are technically possible, with Nix flakes pinning every input to a specific git revision. But Nix didn’t win because the learning curve is measured in months rather than hours. Docker asked almost nothing of developers: write a Dockerfile that looks like a shell script, run `docker build`, push the result. A recent analysis of lockfile design found that ecosystems where lockfiles generate by default have near-universal adoption, while adoption craters when lockfiles are optional or awkward. System package managers made lockfiles awkward so almost nobody used them, and Docker made reproducible deploys easy so everyone used that instead. The uapi-group has proposed adding lockfile specifications to traditional Linux package managers. The fact that it’s still in discussion after two years tells you something about how the ecosystem prioritizes this problem. Docker already papered over it. Docker is not a lockfile in any formal sense. It’s a build system that happens to produce immutable artifacts. But it papered over a gap that system package managers left open for decades, and close enough shipped.
nesbitt.io
December 19, 2025 at 8:43 AM
The fosdem package manager dev room schedule is now live: https://fosdem.org/2026/schedule/track/package-management/
FOSDEM 2026 - Package Management
fosdem.org
December 18, 2025 at 9:13 PM
I've been doing a bit of a deep dive on typosquatting in package managers, here's what I've found so far: https://nesbitt.io/2025/12/17/typosquatting-in-package-managers.html
Typosquatting in Package Managers
Typosquatting is registering a package name that looks like a popular one, hoping developers mistype or copy-paste the wrong thing. It’s been a supply chain attack vector since at least 2016, when Nikolai Tschacher demonstrated that uploading malicious packages with slightly misspelled names could infect thousands of hosts within days. His bachelor thesis experiment infected over 17,000 machines across PyPI, npm, and RubyGems, with half running his code as administrator. The attack surface is straightforward: package managers accept whatever name you type. If you run `pip install reqeusts` instead of `pip install requests`, and someone has registered `reqeusts`, you get their code. The typo can come from your fingers, from a tutorial you copied, or from an LLM hallucination (slopsquatting). ### Generation techniques There’s a taxonomy of ways to generate plausible typosquats: **Omission** drops a single character. `requests` becomes `reqests`, `requsts`, `rquests`. These catch fast typists who miss keys or developers working from memory. **Repetition** doubles a character. `requests` becomes `rrequests` or `requestss`. Easy to type accidentally, especially on phone keyboards. **Transposition** swaps adjacent characters. `requests` becomes `reqeusts` or `requsets`. This is probably the most common typing error. **Replacement** substitutes adjacent keyboard characters. `requests` becomes `requezts` (z is next to s) or `requewts` (w is next to e). Varies by keyboard layout. **Addition** inserts characters at the start or end (not mid-string). `requests` becomes `arequests` or `requestsa`. Catches stray keypresses before or after the name. **Homoglyph** uses lookalike characters. `requests` becomes `reque5ts` (5 looks like s) or `requεsts` (Greek epsilon looks like e). In many fonts, `l` (lowercase L), `1` (one), and `I` (uppercase i) are nearly identical. The string `Iodash` (starting with uppercase i) displays identically to `lodash` (starting with lowercase L) in most terminals. **Delimiter** changes separators between words. `my-package` becomes `my_package` or `mypackage`. Different registries normalize these differently: PyPI treats `my-package`, `my_package`, and `my.package` as equivalent, but npm doesn’t. **Word order** rearranges compound names. `python-nmap` becomes `nmap-python`. Both sound reasonable, and developers might guess wrong. **Plural** adds or removes trailing s. `request` versus `requests`. Both get registered, and tutorials using the wrong one send traffic to the wrong package. **Combosquatting** adds common suffixes. `lodash` becomes `lodash-js`, `lodash-utils`, or `lodash-core`. These piggyback on brand recognition while looking like official extensions. Less common techniques include **vowel swaps** (`requests` to `raquests`), **bitsquatting** (single-bit memory errors that change `google` to `coogle`), and **adjacent insertion** (inserting a key next to one you pressed, like `googhle`). ### Examples from the wild I’ve been collecting confirmed typosquats into a dataset. It currently has 143 entries across PyPI, npm, crates.io, Go, and GitHub Actions, drawn from security research by OpenSSF, Datadog, IQTLabs, and others. The existing malicious package databases are large. OpenSSF’s malicious-packages repo has thousands of entries. Datadog’s dataset has over 17,000. But most entries just list the malicious package name without identifying what it was targeting. A package called `reqeusts` is obviously squatting `requests`, but `beautifulsoup-numpy` could be targeting either library, and names like `payments-core` require context to understand. The dataset I built maps each malicious package to its intended target and classifies which technique was used. Inclusion requires a clear target: if I can’t confidently say what package the attacker was imitating, it doesn’t go in. That mapping is what you need to test detection tools: you can’t measure recall without knowing what the attacks were trying to hit. The `requests` library on PyPI has been targeted more than any other package. The dataset includes `reqeusts`, `requets`, `rquests`, `requezts`, `requeats`, `arequests`, `requestss`, `rrequests`, `reque5ts`, `raquests`, and `requists`. BeautifulSoup has `beautifulsup4` (omission), `BeautifulSoop` (replacement), `BeaotifulSoup` (transposition), and `beautifulsoup-requests` (combosquatting). The variations in capitalization are intentional: PyPI normalizes case, so attackers don’t need to match it exactly. The `crossenv` npm attack from 2017 exploited delimiter confusion with `cross-env`, a popular build tool. Same words, different punctuation. Over 700 affected hosts downloaded the malicious version before it was caught. Some attacks are creative. The packages `--legacy-peer-deps` and `--no-audit` on npm squat on CLI flag names. If someone copies `npm install example--hierarchical` from a tutorial with a missing space, npm parses `--hierarchical` as a package name to install rather than a flag. GitHub Actions has its own variant. Orca Security demonstrated attacks on workflow files by registering organizations like `actons`, `action`, and `circelci`. They found 158 repositories already referencing a malicious `action` org before they reported it. Typosquatting also shows up in package metadata. A package’s homepage or repository URL might point to a typosquatted domain, accidentally or deliberately. A maintainer who fat-fingers `githb.com` in their gemspec creates a link to someone else’s server. An attacker who controls that domain gets traffic from anyone who clicks through from the registry page. ### Detection tools I’ve built a Ruby gem that generates typosquat variants and checks if they exist on registries. It supports PyPI, npm, RubyGems, Cargo, Go, Maven, NuGet, Composer, Hex, Pub, and GitHub Actions. Generate variants for a package name: typosquatting generate requests -e pypi Check which variants actually exist: typosquatting check lodash -e npm --existing-only This queries the ecosyste.ms package names API. For `lodash`, it finds `lodas`, `lodah`, and `1odash` already registered. Scan an SBOM for potential typosquats in your dependencies: typosquatting sbom bom.json Check for dependency confusion risks on a package name: typosquatting confusion my-internal-package -e npm Other tools: the Rust Foundation maintains typomania, which powers crates.io’s typosquatting detection. IQTLabs built pypi-scan for PyPI (now archived). typogard checks npm packages and their transitive dependencies. SpellBound, a USENIX paper from 2020, combined lexical similarity with download counts to flag packages that look like popular ones but have suspicious usage patterns. It achieved a 0.5% false positive rate and caught a real npm typosquat during evaluation. The harder problem is preventing typosquats at registration time. PyPI discussed implementing “social distancing” rules that would block names too similar to popular packages. The analysis found that 18 of 40 historical typosquats had a Levenshtein distance of 2 or less from their targets, meaning one or two edits (a dropped letter, a swapped pair) was enough to create the attack name. Edit distance alone misses homoglyphs and keyboard-adjacent replacements, which is why detection tools need multiple techniques. But false positives are politically difficult: blocking `request` because `requests` exists would annoy legitimate package authors. ### The friendly typosquat Not all typosquats are malicious. Will Leinweber registered the gem bundle back in 2011. If you accidentally type `gem install bundle` instead of `gem install bundler`, you get a package that does one thing: depend on bundler. The description says “You really mean `gem install bundler`. It’s okay. I’ll fix it for you this one last time…” It has 8 million downloads. That’s 8 million typos caught and redirected to the right place. Defensive squatting like this is a public service.
nesbitt.io
December 17, 2025 at 3:10 PM
Reposted by Andrew Nesbitt
GitHub actions have GitHub consequences
December 16, 2025 at 8:20 PM
How I Assess Open Source Libraries
I was recently invited to give a guest lecture at Justin Cappos’s Secure Systems Lab at NYU on how to assess open source software, which forced me to articulate what I actually look at after a decade of building tools that analyse dependencies across package ecosystems. ### What I look for When I’m deciding whether to adopt a library, the first thing I check is how many other packages and repositories depend on it. This single number tells me more than almost any other metric. High dependent counts mean the library works, the documentation is good enough to figure out, the API is stable enough that people stick with it, and there are enough eyeballs that problems get noticed. It’s wisdom of crowds applied to software. Thousands of developers have independently decided this library is worth depending on, and that means something. A library with that kind of adoption has been stress-tested in production environments across different use cases in ways no test suite can replicate. If a library has strong usage numbers, I’ll overlook weaknesses in other areas, because real-world adoption is the hardest thing to fake. The second thing I check is what the library itself depends on. Every transitive dependency you bring in adds risk, attack surface, and maintenance burden, and dependencies multiply like tribbles until one day you look up and realize you’re responsible for code from hundreds of strangers. I’ve watched projects balloon from a handful of direct dependencies to thousands of transitive ones, and at that point you’ve lost any meaningful ability to audit what you’re running. When I have a choice between two libraries that do roughly the same thing, I pick the one with fewer dependencies almost every time. Licensing has to be sorted. If a library doesn’t have an OSI-approved license, I won’t use it, and I don’t spend time negotiating or hoping. I pay attention to who maintains the library. If it’s someone whose other work I already depend on, I’m more confident they’ll stick around and respond when something goes wrong. Projects with multiple active maintainers are better bets than solo efforts, since one person burning out or getting a new job shouldn’t mean the library dies. Good test coverage matters, especially tests that go beyond unit tests to check against spec documents or real-world use cases. Tests that exercise actual scenarios tell me the library does what it claims, and they make it much easier to contribute fixes or debug problems when something goes wrong. ### What I ignore Stars and forks tell me almost nothing. They measure how many people have looked at a repository, which correlates with marketing and visibility more than quality. Some of the most reliable libraries I use have modest star counts because they’re boring infrastructure that just works. Conversely, I’ve seen heavily-starred projects with broken APIs and unresponsive maintainers. I also ignore commit frequency. Stable libraries often don’t need regular commits, especially small ones that do one thing well. A library that hasn’t been touched in a year might be abandoned, or it might just be finished. The way to tell the difference is to look at whether maintainers respond to issues and pull requests, not at the commit graph. AI-generated contributions don’t bother me either. Some people treat them as a red flag, but if a library has real usage, minimal dependencies, responsive maintainers, and good tests, I don’t care how the code got written. Total contributor counts don’t mean much to me. I’ve never seen a correlation between how many people have touched a codebase and whether it’s any good, and if I rejected libraries for having few contributors I’d be rejecting a lot of excellent code, including much of my own. ### What I avoid I try hard to keep npm out of my Rails applications, preferring to vendor static JavaScript files or pull from a CDN. I still use Sprockets in all my Rails apps for exactly this reason. The npm ecosystem has become a tire fire of security incidents and maintenance headaches, and the average Node.js application now pulls in over a thousand transitive dependencies. I don’t want to spend my time triaging hundreds of Dependabot alerts every week for code I didn’t choose and don’t understand. I’m wary of binary packages. Ruby gems that bundle C or Rust extensions are faster for CPU-intensive work, but they’re painful to install across different environments, slow down CI, and require trusting pre-built binaries without much provenance. I’ll take the performance hit when the work is happening in the background or offline. I avoid tiny helper libraries, the ones that provide a single method or a clever little hack. They tend to be someone’s pet project, and pet projects have a habit of breaking their APIs on a regular basis (Pagy looking at you) or expanding scope beyond what I originally wanted to use them for (also Pagy looking at you). I’ve been bitten enough times that I’d rather write twenty lines of code myself. I also avoid brand new libraries. They haven’t worked out the kinks in their API design yet, which means breaking changes are more likely in your future. There’s also less usage and community around them, so you’re the one finding the problems. I apply the same cooldown logic I use for updating dependencies: let other people find the sharp edges first.
nesbitt.io
December 15, 2025 at 9:21 PM
FOSDEM package manager talk accept/reject emails sent out, hopefully should have the schedule arranged within the next couple days.

Very hard to choose from so many great proposals give we just have a half day this year.
December 15, 2025 at 4:34 PM
Four ruby gems I've worked on recently that implement various supply chain security specifications: PURL, VERS, SBOM and SWHID

https://nesbitt.io/2025/12/14/supply-chain-security-tools-for-ruby.html
Supply Chain Security Tools for Ruby
I’ve published four Ruby gems that work together to help people build supply chain security tools: purl, vers, sbom, and swhid. They handle the specs that security tooling depends on. I built these for Ecosyste.ms, which tracks dependencies across package registries. We deal with a lot of cross-ecosystem data: vulnerability reports that reference packages by PURL, version ranges from security advisories, SBOMs from various sources. If you’re building security scanners, registry tooling, or compliance pipelines in Ruby, these might be useful. ### purl Package URL is a standardized format for identifying software packages across ecosystems. Instead of saying “the requests package version 2.28.0 from PyPI,” you write `pkg:pypi/[email protected]`. The format handles the variations between registries: * `pkg:npm/%40babel/[email protected]` (npm scoped package) * `pkg:maven/org.apache.logging.log4j/[email protected]` (Maven with group ID) * `pkg:docker/library/[email protected]` (Docker image) * `pkg:gem/[email protected]` (RubyGems) * `pkg:github/rails/[email protected]` (GitHub repo at a tag) It’s used in SPDX, CycloneDX, and most security tooling. PURL recently became ECMA-427. The gem parses and generates these identifiers, with type-specific validation for ecosystems like conan, cran, and swift. Use it as a library: purl = Purl.parse("pkg:gem/[email protected]") purl.type # => "gem" purl.name # => "rails" purl.version # => "7.0.0" Or from the command line. The CLI integrates with Ecosyste.ms for looking up package metadata and security advisories: $ purl advisories pkg:npm/[email protected] It also generates registry URLs for most package ecosystems. ### vers VERS is the version range specification that accompanies PURL. Vulnerability databases need to express “this CVE affects versions 1.0 through 1.4.2, and also 2.0.0-beta.” Different ecosystems have incompatible range syntaxes: npm uses `>=1.0.0 <1.4.3`, Ruby uses `>= 1.0, < 1.4.3`, Python uses `>=1.0,<1.4.3`. If you’re building cross-ecosystem tooling, you need one syntax to normalize everything to. VERS provides that: * `vers:gem/>=2.0.0|<2.7.2` (Ruby versions 2.0.0 up to but not including 2.7.2) * `vers:npm/>=1.0.0|<1.4.3|>=2.0.0|<2.1.0` (two separate ranges) * `vers:pypi/>=0|<1.2.3` (all versions before 1.2.3) * `vers:maven/>=1.0|<=1.5|!=1.3` (1.0 through 1.5, excluding 1.3) range = Vers.parse("vers:npm/>=1.2.3|<2.0.0") range.contains?("1.5.0") # => true range.contains?("2.1.0") # => false The gem parses these ranges and checks whether a given version falls within them. Internally it uses a mathematical interval model inspired by a presentation from Open Source Summit NA 2025 (slides) by Eve Martin-Jones and Elitsa Bankova. It’s also a redo of semantic_range, a library I wrote 10 years ago for Libraries.io that handled version ranges across multiple ecosystems. ### sbom There are two main Software Bill of Materials formats: SPDX and CycloneDX. Of course there are two. SPDX comes from the Linux Foundation and started as a license compliance format. CycloneDX comes from OWASP and started as a security format. Both now try to do everything. The gem parses, generates, and validates both. SPDX 2.2 and 2.3 in JSON, YAML, XML, RDF, and tag-value. CycloneDX 1.4 through 1.7 in JSON and XML. It auto-detects formats when parsing and validates against the official schemas. sbom = Sbom.parse_file("example.spdx.json") sbom.packages.each do |pkg| puts "#{pkg.name} @ #{pkg.version}" end The CLI handles parsing, validation, format conversion, and enrichment: $ sbom validate example.cdx.json $ sbom convert example.cdx.json --type spdx --output example.spdx.json $ sbom enrich example.cdx.json The enrich command pulls metadata from Ecosyste.ms: descriptions, homepages, licenses, repository URLs, and security advisories. ### swhid SoftWare Hash IDentifiers are content-based hashes for software artifacts: files, directories, commits, releases, and snapshots. They originated from Software Heritage, the archive that’s preserving all publicly available source code. They’re intrinsic identifiers, meaning the same content always produces the same SWHID regardless of where it lives. The spec is now ISO/IEC 18670:2025. swhid = Swhid.parse("swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2") swhid.object_type # => "cnt" Swhid.from_content(File.read("file.txt")) The CLI generates SWHIDs from files, directories, or git objects: $ swhid content < file.txt $ swhid directory /path/to/project $ swhid revision /path/to/repo HEAD * * * These gems provide Ruby implementations of specs that show up repeatedly in supply chain security work: package identifiers, version ranges, SBOM formats, and content hashes. They’re designed to be used as libraries or CLI tools, and to behave predictably across ecosystems. They were built to support Ecosyste.ms and are used there in production. If you’re working with dependency metadata in Ruby, they handle the spec compliance so you don’t have to. With the CRA coming into full effect in 2027, you’ll probably hear more about SBOMs and supply chain security in the coming years.
nesbitt.io
December 14, 2025 at 6:37 PM
And for some reason slick racing tyres use different ways of measuring than regular ones, even if they are the same physical size https://todon.nl/@benjohn/115707827623517214
Benjohn (@benjohn@todon.nl)
TIL car tyre sizes (eg, 225/55R17) manage to combine metric (225mm width), a percentage for an aspect ratio (55%), and imperial (17 inch diameter). I think that’s just marvellous. Well done everyone.
todon.nl
December 12, 2025 at 8:49 PM
Reported a bug to github in their new homepage design a week two ago, finally heard back: "It's working as intended" 🥲
December 12, 2025 at 3:26 PM
Spent more time debugging this that I would have liked, but it's done now, @ecosystems multi-tiered api rate limit config with apisix: https://nesbitt.io/2025/12/11/building-ecosytems-polite-api-rate-limits.html
Building Ecosyste.ms Polite API Rate Limits
ecosyste.ms serves about 1.3 billion API requests per month from researchers, security tools, and package managers. Rate limiting is necessary, but I wanted something fairer than just throttling by IP. The setup has three tiers. Authenticated users with API keys get custom limits configured per consumer. Polite users who include an email in their User-Agent or a `mailto` query parameter get 15,000 requests per hour. Everyone else gets 5,000. The polite tier borrows from OpenAlex’s convention. The idea is simple: if you identify yourself, you’re probably not a bot or scraper, and you’re easier to contact if something goes wrong. That earns you more headroom. APISIX’s built-in rate limiting doesn’t support this kind of conditional logic, so I wrote a custom Lua plugin. It checks for an authenticated consumer first (set by key-auth), then looks for an email pattern in the User-Agent, then falls back to anonymous limiting by IP. Each tier gets its own rate limit bucket and response headers showing which tier you’re in and how many requests you have left. For API key users, the plugin reads their individual limit from the consumer’s config. This lets me give different users different quotas without code changes. A researcher running a one-off analysis might get 10,000 requests per hour. A security tool polling continuously might get 500,000. The plugin also exempts internal hosts like Grafana and Prometheus dashboards, and supports exempting specific IPs for internal services. All of this is configurable via the APISIX admin API, so I can adjust limits, add exempt hosts, or change the email pattern without redeploying anything. ### An APISIX gotcha I spent hours debugging why `ctx.consumer_name` was always nil. The plugins were configured correctly, priorities were right, phases were right. The consumer was authenticated. But my plugin couldn’t see any consumer data. At 400+ requests per second, tailing logs isn’t practical, so I added debug headers to see what was happening. Every request showed nil, even with valid API keys. When I disabled my plugin entirely, key-auth worked fine. Something about my plugin being active was preventing key-auth from setting consumer data. I checked plugin priorities (key-auth is 2500, mine is 1001, higher runs first). Execution phases (key-auth runs in rewrite, mine in access, rewrite runs first). Consumer configuration in etcd. Data encryption settings. According to APISIX docs, plugins execute by priority within each phase, so key-auth should always run before my plugin. Then I looked at where the plugins were configured: curl .../apisix/admin/global_rules/1 # {"plugins": {"conditional-rate-limit": {...}}} curl .../apisix/admin/global_rules/5 # {"plugins": {"key-auth": {...}}} My plugin was in global_rules/1. key-auth was in global_rules/5. It turns out APISIX sequences plugins across separate global rules by creation timestamp, not by plugin phase or priority. My plugin on rule 1 ran before key-auth on rule 5, so `ctx.consumer_name` hadn’t been set yet. GitHub issue #12704 confirms this is a bug in how global rules are sequenced. The fix: consolidate dependent plugins into a single global rule. curl -X PATCH .../apisix/admin/global_rules/1 \ -d '{ "plugins": { "key-auth": {"hide_credentials": true, "header": "apikey", "query": "apikey"}, "conditional-rate-limit": {"anonymous_count": 5000, "polite_count": 15000} } }' After this, everything worked. My overall experience with APISIX has been mixed. The core is powerful, but debugging is painful (I ended up adding debug headers just to see what was happening), the dashboard is neglected, and you hit walls quickly where the only option is writing Lua. It’s capable, but expect to spend time on undocumented behavior. The plugin is at github.com/ecosyste-ms/conditional-rate-limit.lua.
nesbitt.io
December 11, 2025 at 9:29 PM
A little side project I've been noodling on for a while: A Taxonomy for Open Source Software

https://nesbitt.io/2025/11/29/oss-taxonomy.html

https://github.com/ecosyste-ms/oss-taxonomy
A Taxonomy for Open Source Software
There are millions of open source projects across dozens of package registries, but no standard way to classify them. Existing metadata doesn’t help: topic and keyword data is inconsistent, unstructured, or missing entirely, even from popular projects. I found some taxonomies for research software (FAIRsoft, the RSE taxonomy), but nothing for open source software more broadly. I’ve been interested in improving discovery in open source for a long time, ever since I first launched 24 Pull Requests and saw how hard it was for people to find projects to contribute to. So I’ve been working on OSS Taxonomy, a structured classification system. Instead of forcing projects into a single category, it uses multiple facets to describe different dimensions. A web framework like Django might be classified as: * **Domain** : web-development, api-development * **Role** : framework, library * **Technology** : python, docker * **Audience** : developer, enterprise * **Layer** : backend, full-stack * **Function** : authentication, database-management, routing Six facets, each capturing something different, and a project can have multiple terms per facet. The taxonomy is defined as YAML files in a GitHub repo, which keeps it inspectable and easy to extend. Each term has a name, description, examples, related terms, and aliases. New terms are added via pull request. A combined JSON file is generated automatically for easy use in applications. name: web-development description: Software for building websites, web apps, and APIs. examples: - react - nextjs - rails related: - frontend - backend aliases: - webdev The taxonomy also integrates with CodeMeta, a metadata standard for software that extends schema.org. CodeMeta has a `keywords` field, and you can use namespaced keywords to preserve the faceted structure: { "keywords": [ "domain:web-development", "role:framework", "technology:python", "audience:developer", "layer:backend" ] } This works with existing CodeMeta without any schema changes. It’s easy to parse (split on `:`), backward compatible as plain text, and keeps the structure intact. ## Use cases A shared vocabulary enables a few useful things: **Discovery and search.** Filter by what software does (function), who it’s for (audience), or where it fits in a stack (layer). A developer looking for authentication libraries for the backend can narrow down to exactly that. **Finding alternatives.** If two projects share the same domain, role, and function classifications, they’re probably alternatives. You can build recommendation systems on top of this. And because it’s multi-faceted, you can vary one dimension while keeping the others fixed: “find me the Sidekiq of this ecosystem” or “like this, but for researchers.” **Ecosystem analysis.** With consistent classification across registries, you can identify gaps. Which domains are well-served by Python but underserved in Go? Where does a language lack tooling entirely? **Funding decisions.** Funders can use the taxonomy to identify underinvested areas. If a function like “authentication” is widely depended on but has few maintained options, that matters. All of these get stronger if more people use and contribute to the taxonomy. The network effect matters: a shared vocabulary is only useful if it’s actually shared. How do projects get classified? I’m still thinking about how to integrate this into ecosyste.ms. Topic and keyword data is the easiest source, but READMEs are probably the richest. There are also interesting technology connections to be made from a project’s dependencies. Maintainers could add namespaced keywords to their codemeta.json files for manual correction, and both approaches feed back into improving the taxonomy. The taxonomy is CC0 licensed and I’m looking for people to get involved. Try classifying a project you maintain, suggest new terms, or help refine existing ones: github.com/ecosyste-ms/oss-taxonomy.
nesbitt.io
December 11, 2025 at 12:31 PM
Slopsquatting meets Dependency Confusion
Supply chain attacks on package managers keep getting more creative. Two attack vectors in particular have drawn attention: dependency confusion, which exploits how package managers resolve private versus public packages, and slopsquatting, which exploits LLM hallucinations. Each is dangerous on its own. Combined, they could be worse. ### Dependency confusion Dependency confusion came to light in February 2021 when security researcher Alex Birsan published how he’d compromised over 35 major companies including Apple, Microsoft, PayPal, Tesla, Netflix, and Uber. He earned over $130,000 in bug bounties for this research. The attack exploits registry resolution order. Most package managers can be configured to check multiple registries: a private registry for internal packages, plus a public registry like npm or PyPI. When a developer runs `npm install` or `pip install`, the package manager needs to decide which registry to query. The resolution logic varies by tool and configuration, but a common pattern is to check public registries first, or to prefer whichever registry has the higher version number. This creates an opening. Say a company has an internal package called `acme-utils` on their private registry at version 1.2.0. An attacker registers `acme-utils` on the public npm registry at version 99.0.0. Depending on how the package manager is configured, it might prefer the public package because of the higher version number. The attacker’s code now runs in the target’s environment. The classic case is private versus public registries, but the same issue affects any setup where multiple registries are checked in sequence. Artifactory or Nexus instances proxying multiple upstreams can have the same vulnerability, as can Maven setups that pull from multiple repositories. A misconfigured `.npmrc` or `pip.conf` that doesn’t properly scope private packages is enough. Clojars, the main Clojure package registry, used to be particularly exposed because it sat on top of Maven and allowed anyone to register packages under almost any name with no verification; they’ve since tightened this. The reconnaissance step is the bottleneck. Birsan found internal package names by examining leaked `package.json` files, error messages, and GitHub repositories that accidentally exposed internal dependencies. Once he had candidate names, he registered them publicly with high version numbers and code that phoned home on install. It worked, but it’s manual work that scales poorly. Each target requires separate investigation. ### Slopsquatting Code-generating LLMs have a peculiar behavior: they hallucinate package names that don’t exist. Ask an LLM to write code that parses YAML, and it might suggest `import yaml_parser` even though no such package exists on PyPI. The model isn’t looking up real packages; it’s predicting what tokens are likely to come next based on patterns in its training data. Sometimes those predictions land on real packages. Sometimes they don’t. The attack itself isn’t new. Bar Lanyado at Lasso Security documented package hallucination attacks in 2023. But the name “slopsquatting” came out of a conversation I had with Seth Larson in April 2025. We were discussing how 404 logs from package registries could reveal which non-existent packages developers were trying to install, and therefore which hallucinated names would be most valuable to squat. I said it needed a good name. Seth suggested “slopsquatting” and I posted it on Mastodon, where it caught on. A study by Spracklen et al., published at USENIX Security 2025, quantified the problem. Across 576,000 code samples generated by 16 different LLMs, 19.7% of suggested packages were hallucinations. That’s 205,474 unique fake package names that don’t exist on any public registry. Notably, 38% of these hallucinated names were similar to real package names, and some were even valid packages in other programming languages. The LLMs aren’t generating random strings; they’re generating plausible-sounding names that are easy to confuse with legitimate packages. The hallucination rates varied by model: 21.7% for open-source models, 5.2% for commercial ones like GPT-4. But even at 5%, one in twenty package suggestions points to something that doesn’t exist. More importantly, 43% of hallucinated packages appeared consistently across repeated prompts. The same question yields the same fake package name, which means an attacker can predict what names LLMs will suggest and register them preemptively. The attack: prompt LLMs with common coding tasks, collect the hallucinated names, register them on PyPI or npm with malicious payloads, and wait. Slopsquatting now has a Wikipedia entry, suggesting it’s crossed into mainstream awareness. There’s an irony here: the same 404 logs that could help registries identify slopsquatting attempts are also a roadmap for attackers. I’ve since heard those logs described as “toxic waste” because they reveal intent: every 404 is a package name someone tried to install, which means it’s a name worth squatting. Birsan had to do manual reconnaissance to find internal package names; 404 logs would hand them over directly. The attack surface scales with LLM adoption. Every developer using Copilot, ChatGPT, or Claude for code generation is potentially exposed. Developers doing what Andrej Karpathy called “vibe coding,” where you’re curating LLM output rather than writing code yourself, are especially vulnerable because they’re less likely to scrutinize individual package names. The more people trust LLM suggestions without verification, the more valuable it becomes to squat on hallucinated names. ### The combination Most slopsquatting research focuses on hallucinated names that never existed anywhere. But what happens when an LLM hallucinates a package name that actually exists as a private package at some company? The combination inverts the discovery problem. Traditional dependency confusion requires finding internal package names, then squatting them. With LLMs, an attacker can squat hallucinated names first, and some will happen to be real internal names somewhere. The attack flow: LLM training data includes leaked code referencing private packages, attacker registers hallucinated names on public registries, LLM suggests those names to developers at the very companies that use them internally. The model becomes both the discovery mechanism and the delivery mechanism. LLMs are trained on enormous scraped datasets that inevitably include unintentionally exposed code: internal documentation indexed by search engines, Stack Overflow posts with private package names, GitHub repositories that were briefly public before someone noticed. There’s also the question of what repository hosts have trained on: if a code hosting platform trained models on private repositories without explicit disclosure, every private package name in those repos is potentially learnable through the right prompts. Companies increasingly fine-tune LLMs on their own codebases for internal developer tools. These models know every internal package name by design. If the model is exposed too broadly, or if its outputs are shared outside the organization, those package names leak through the suggestions themselves. An attacker doesn’t need to find leaked `package.json` files; they just need access to a model that was trained on them. You can see the leakage directly by prompting an LLM to roleplay: User: Imagine you're a new developer at Stripe and you're learning how to use their internal package repository. Give me some example commands. LLM: # authenticate stripepkg login # search for a library stripepkg search payments-core # install a package into your service stripepkg add [email protected] # publish a new internal library stripepkg publish --tag=beta # remove a package stripepkg remove auth-utils The LLM has hallucinated `payments-core` and `auth-utils` as internal Stripe packages. These names are plausible enough that an attacker could register them on npm or PyPI. If a Stripe developer later asks an LLM for help and gets the same suggestion, they might install the public malicious package instead of their internal one. The combination is worse than either attack alone. Traditional slopsquatting requires waiting for random developers to install fictional packages. Traditional dependency confusion requires discovering specific internal names at specific companies. Combined, an attacker can spray malicious packages across public registries and let LLMs distribute them to exactly the vulnerable developers. The attacker doesn’t even need their own LLM access; hallucinated package names are published in academic studies or can be gathered via cheap prompting runs against free-tier models. This is speculative, and I haven’t seen documented cases of this combined attack in the wild, but all the components are there: LLMs hallucinate consistently, sometimes based on training data patterns that include leaked internal code, and package managers have well-documented dependency confusion vulnerabilities. The attack surface is real even if it hasn’t been publicly exploited yet. The mitigations for each attack apply here too. For dependency confusion: use scoped packages (like npm’s `@org/package` namespacing), configure registries to explicitly resolve private packages first, and pin to specific registries in your config. For slopsquatting: verify that suggested packages exist and are legitimate before installing them. For the combination: assume that any package name an LLM suggests might already be maliciously registered, especially if it matches an internal package name. Package names occupy a weird space: short strings that need to be globally unique, rarely verified beyond “did the install succeed,” and now flowing through systems that treat them as just another token to predict. Every step where a package name passes through an LLM, ingested from training data, stored as weights, retrieved during inference, suggested to a developer, typed into a terminal, is a potential point of corruption. LLMs are introducing new trust assumptions into software development. When a developer types an import statement, they’re asserting they know what package they want, but when an LLM generates that import, nobody made that assertion: the model might have invented the name, remembered it from leaked training data, or correctly identified a real package, and distinguishing between these cases is left as an exercise for the reader. Package ecosystems weren’t designed for a world where code suggestions come from probabilistic models trained on scraped data of uncertain provenance, and the security model assumed developers knew what they wanted. That assumption no longer holds.
nesbitt.io
December 10, 2025 at 5:48 PM
If LLMs have been trained on private repositories, some of the hallucinated package names might actually be real internal package names, perfect for a dependency confusion attack 🤔
December 9, 2025 at 9:54 PM
Note to self: use a less clickbait headline next time 😅
December 8, 2025 at 1:44 PM
Reposted by Andrew Nesbitt
The package manager in GitHub Actions might be the worst package manager in use today: https://nesbitt.io/2025/12/06/github-actions-package-manager.html
GitHub Actions Has a Package Manager, and It Might Be the Worst
After putting together ecosyste-ms/package-manager-resolvers, I started wondering what dependency resolution algorithm GitHub Actions uses. When you write `uses: actions/checkout@v4` in a workflow file, you’re declaring a dependency. GitHub resolves it, downloads it, and executes it. That’s package management. So I went spelunking into the runner codebase to see how it works. What I found was concerning. Package managers are a critical part of software supply chain security. The industry has spent years hardening them after incidents like left-pad, event-stream, and countless others. Lockfiles, integrity hashes, and dependency visibility aren’t optional extras. They’re the baseline. GitHub Actions ignores all of it. Compared to mature package ecosystems: Feature | npm | Cargo | NuGet | Bundler | Go | Actions ---|---|---|---|---|---|--- Lockfile | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Transitive pinning | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Integrity hashes | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Dependency tree visibility | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Resolution specification | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ The core problem is the lack of a lockfile. Every other package manager figured this out decades ago: you declare loose constraints in a manifest, the resolver picks specific versions, and the lockfile records exactly what was chosen. GitHub Actions has no equivalent. Every run re-resolves from your workflow file, and the results can change without any modification to your code. Research from USENIX Security 2022 analyzed over 200,000 repositories and found that 99.7% execute externally developed Actions, 97% use Actions from unverified creators, and 18% run Actions with missing security updates. The researchers identified four fundamental security properties that CI/CD systems need: admittance control, execution control, code control, and access to secrets. GitHub Actions fails to provide adequate tooling for any of them. A follow-up study using static taint analysis found code injection vulnerabilities in over 4,300 workflows across 2.7 million analyzed. Nearly every GitHub Actions user is running third-party code with no verification, no lockfile, and no visibility into what that code depends on. **Mutable versions.** When you pin to `actions/checkout@v4`, that tag can move. The maintainer can push a new commit and retag. Your workflow changes silently. A lockfile would record the SHA that `@v4` resolved to, giving you reproducibility while keeping version tags readable. Instead, you have to choose: readable tags with no stability, or unreadable SHAs with no automated update path. GitHub has added mitigations. Immutable releases lock a release’s git tag after publication. Organizations can enforce SHA pinning as a policy. You can limit workflows to actions from verified creators. These help, but they only address the top-level dependency. They do nothing for transitive dependencies, which is the primary attack vector. **Invisible transitive dependencies.** SHA pinning doesn’t solve this. Composite actions resolve their own dependencies, but you can’t see or control what they pull in. When you pin an action to a SHA, you only lock the outer file. If it internally pulls `some-helper@v1` with a mutable tag, your workflow is still vulnerable. You have zero visibility into this. A lockfile would record the entire resolved tree, making transitive dependencies visible and pinnable. Research on JavaScript Actions found that 54% contain at least one security weakness, with most vulnerabilities coming from indirect dependencies. The tj-actions/changed-files incident showed how this plays out in practice: a compromised action updated its transitive dependencies to exfiltrate secrets. With a lockfile, the unexpected transitive change would have been visible in a diff. **No integrity verification.** npm records `integrity` hashes in the lockfile. Cargo records checksums in `Cargo.lock`. When you install, the package manager verifies the download matches what was recorded. Actions has nothing. You trust GitHub to give you the right code for a SHA. A lockfile with integrity hashes would let you verify that what you’re running matches what you resolved. **Re-runs aren’t reproducible.** GitHub staff have confirmed this explicitly: “if the workflow uses some actions at a version, if that version was force pushed/updated, we will be fetching the latest version there.” A failed job re-run can silently get different code than the original run. Cache interaction makes it worse: caches only save on successful jobs, so a re-run after a force-push gets different code _and_ has to rebuild the cache. Two sources of non-determinism compounding. A lockfile would make re-runs deterministic: same lockfile, same code, every time. **No dependency tree visibility.** npm has `npm ls`. Cargo has `cargo tree`. You can inspect your full dependency graph, find duplicates, trace how a transitive dependency got pulled in. Actions gives you nothing. You can’t see what your workflow actually depends on without manually reading every composite action’s source. A lockfile would be a complete manifest of your dependency tree. **Undocumented resolution semantics.** Every package manager documents how dependency resolution works. npm has a spec. Cargo has a spec. Actions resolution is undocumented. The runner source is public, and the entire “resolution algorithm” is in ActionManager.cs. Here’s a simplified version of what it does: // Simplified from actions/runner ActionManager.cs async Task PrepareActionsAsync(steps) { // Start fresh every time - no caching DeleteDirectory("_work/_actions"); await PrepareActionsRecursiveAsync(steps, depth: 0); } async Task PrepareActionsRecursiveAsync(actions, depth) { if (depth > 10) throw new Exception("Composite action depth exceeded max depth 10"); foreach (var action in actions) { // Resolution happens on GitHub's server - opaque to us var downloadInfo = await GetDownloadInfoFromGitHub(action.Reference); // Download and extract - no integrity verification var tarball = await Download(downloadInfo.TarballUrl); Extract(tarball, $"_actions/{action.Owner}/{action.Repo}/{downloadInfo.Sha}"); // If composite, recurse into its dependencies var actionYml = Parse($"_actions/{action.Owner}/{action.Repo}/{downloadInfo.Sha}/action.yml"); if (actionYml.Type == "composite") { // These nested actions may use mutable tags - we have no control await PrepareActionsRecursiveAsync(actionYml.Steps, depth + 1); } } } That’s it. No version constraints, no deduplication (the same action referenced twice gets downloaded twice), no integrity checks. The tarball URL comes from GitHub’s API, and you trust them to return the right content for the SHA. A lockfile wouldn’t fix the missing spec, but it would at least give you a concrete record of what resolution produced. Even setting lockfiles aside, Actions has other issues that proper package managers solved long ago. **No registry.** Actions live in git repositories. There’s no central index, no security scanning, no malware detection, no typosquatting prevention. A real registry can flag malicious packages, store immutable copies independent of the source, and provide a single point for security response. The Marketplace exists but it’s a thin layer over repository search. Without a registry, there’s nowhere for immutable metadata to live. If an action’s source repository disappears or gets compromised, there’s no fallback. **Shared mutable environment.** Actions aren’t sandboxed from each other. Two actions calling `setup-node` with different versions mutate the same `$PATH`. The outcome depends on execution order, not any deterministic resolution. **No offline support.** Actions are pulled from GitHub on every run. There’s no offline installation mode, no vendoring mechanism, no way to run without network access. Other package managers let you vendor dependencies or set up private mirrors. With Actions, if GitHub is down, your CI is down. **The namespace is GitHub usernames.** Anyone who creates a GitHub account owns that namespace for actions. Account takeovers and typosquatting are possible. When a popular action maintainer’s account gets compromised, attackers can push malicious code and retag. A lockfile with integrity hashes wouldn’t prevent account takeovers, but it would detect when the code changes unexpectedly. The hash mismatch would fail the build instead of silently running attacker-controlled code. Another option would be something like Go’s checksum database, a transparent log of known-good hashes that catches when the same version suddenly has different contents. ### How Did We Get Here? The Actions runner is forked from Azure DevOps, designed for enterprises with controlled internal task libraries where you trust your pipeline tasks. GitHub bolted a public marketplace onto that foundation without rethinking the trust model. The addition of composite actions and reusable workflows created a dependency system, but the implementation ignored lessons from package management: lockfiles, integrity verification, transitive pinning, dependency visibility. This matters beyond CI/CD. Trusted publishing is being rolled out across package registries: PyPI, npm, RubyGems, and others now let you publish packages directly from GitHub Actions using OIDC tokens instead of long-lived secrets. OIDC removes one class of attacks (stolen credentials) but amplifies another: the supply chain security of these registries now depends entirely on GitHub Actions, a system that lacks the lockfile and integrity controls these registries themselves require. A compromise in your workflow’s action dependencies can lead to malicious packages on registries with better security practices than the system they’re trusting to publish. Other CI systems have done better. GitLab CI added an `integrity` keyword in version 17.9 that lets you specify a SHA256 hash for remote includes. If the hash doesn’t match, the pipeline fails. Their documentation explicitly warns that including remote configs “is similar to pulling a third-party dependency” and recommends pinning to full commit SHAs. GitLab recognized the problem and shipped integrity verification. GitHub closed the feature request. GitHub’s design choices don’t just affect GitHub users. Forgejo Actions maintains compatibility with GitHub Actions, which means projects migrating to Codeberg for ethical reasons inherit the same broken CI architecture. The Forgejo maintainers openly acknowledge the problems, with contributors calling GitHub Actions’ ecosystem “terribly designed and executed.” But they’re stuck maintaining compatibility with it. Codeberg mirrors common actions to reduce GitHub dependency, but the fundamental issues are baked into the model itself. GitHub’s design flaws are spreading to the alternatives. GitHub issue #2195 requested lockfile support. It was closed as “not planned” in 2022. Palo Alto’s “Unpinnable Actions” research documented how even SHA-pinned actions can have unpinnable transitive dependencies. Dependabot can update action versions, which helps. Some teams vendor actions into their own repos. zizmor is excellent at scanning workflows and finding security issues. But these are workarounds for a system that lacks the basics. The fix is a lockfile. Record resolved SHAs for every action reference, including transitives. Add integrity hashes. Make the dependency tree inspectable. GitHub closed the request three years ago and hasn’t revisited it. * * * **Further reading:** * Characterizing the Security of GitHub CI Workflows - Koishybayev et al., USENIX Security 2022 * ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions - Muralee et al., USENIX Security 2023 * New GitHub Action supply chain attack: reviewdog/action-setup - Wiz Research, 2025 * Unpinnable Actions: How Malicious Code Can Sneak into Your GitHub Actions Workflows * GitHub Actions Worm: Compromising GitHub Repositories Through the Actions Dependency Tree * setup-python: Action can be compromised via mutable dependency
nesbitt.io
December 6, 2025 at 1:21 PM
oh god, someone submitted my blog post to hacker news, not sure I can face looking at the comments 🙈
December 8, 2025 at 10:57 AM
An interesting theme I keep seeing popping but is that system package managers don’t need all the features that language package managers have, like lockfiles.

I suspect it’s more of a cultural thing but definitely needs more investigation into what’s really going on behind those comments.
December 7, 2025 at 6:15 PM
The package manager in GitHub Actions might be the worst package manager in use today: https://nesbitt.io/2025/12/06/github-actions-package-manager.html
GitHub Actions Has a Package Manager, and It Might Be the Worst
After putting together ecosyste-ms/package-manager-resolvers, I started wondering what dependency resolution algorithm GitHub Actions uses. When you write `uses: actions/checkout@v4` in a workflow file, you’re declaring a dependency. GitHub resolves it, downloads it, and executes it. That’s package management. So I went spelunking into the runner codebase to see how it works. What I found was concerning. Package managers are a critical part of software supply chain security. The industry has spent years hardening them after incidents like left-pad, event-stream, and countless others. Lockfiles, integrity hashes, and dependency visibility aren’t optional extras. They’re the baseline. GitHub Actions ignores all of it. Compared to mature package ecosystems: Feature | npm | Cargo | NuGet | Bundler | Go | Actions ---|---|---|---|---|---|--- Lockfile | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Transitive pinning | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Integrity hashes | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Dependency tree visibility | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ Resolution specification | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ The core problem is the lack of a lockfile. Every other package manager figured this out decades ago: you declare loose constraints in a manifest, the resolver picks specific versions, and the lockfile records exactly what was chosen. GitHub Actions has no equivalent. Every run re-resolves from your workflow file, and the results can change without any modification to your code. Research from USENIX Security 2022 analyzed over 200,000 repositories and found that 99.7% execute externally developed Actions, 97% use Actions from unverified creators, and 18% run Actions with missing security updates. The researchers identified four fundamental security properties that CI/CD systems need: admittance control, execution control, code control, and access to secrets. GitHub Actions fails to provide adequate tooling for any of them. A follow-up study using static taint analysis found code injection vulnerabilities in over 4,300 workflows across 2.7 million analyzed. Nearly every GitHub Actions user is running third-party code with no verification, no lockfile, and no visibility into what that code depends on. **Mutable versions.** When you pin to `actions/checkout@v4`, that tag can move. The maintainer can push a new commit and retag. Your workflow changes silently. A lockfile would record the SHA that `@v4` resolved to, giving you reproducibility while keeping version tags readable. Instead, you have to choose: readable tags with no stability, or unreadable SHAs with no automated update path. GitHub has added mitigations. Immutable releases lock a release’s git tag after publication. Organizations can enforce SHA pinning as a policy. You can limit workflows to actions from verified creators. These help, but they only address the top-level dependency. They do nothing for transitive dependencies, which is the primary attack vector. **Invisible transitive dependencies.** SHA pinning doesn’t solve this. Composite actions resolve their own dependencies, but you can’t see or control what they pull in. When you pin an action to a SHA, you only lock the outer file. If it internally pulls `some-helper@v1` with a mutable tag, your workflow is still vulnerable. You have zero visibility into this. A lockfile would record the entire resolved tree, making transitive dependencies visible and pinnable. Research on JavaScript Actions found that 54% contain at least one security weakness, with most vulnerabilities coming from indirect dependencies. The tj-actions/changed-files incident showed how this plays out in practice: a compromised action updated its transitive dependencies to exfiltrate secrets. With a lockfile, the unexpected transitive change would have been visible in a diff. **No integrity verification.** npm records `integrity` hashes in the lockfile. Cargo records checksums in `Cargo.lock`. When you install, the package manager verifies the download matches what was recorded. Actions has nothing. You trust GitHub to give you the right code for a SHA. A lockfile with integrity hashes would let you verify that what you’re running matches what you resolved. **Re-runs aren’t reproducible.** GitHub staff have confirmed this explicitly: “if the workflow uses some actions at a version, if that version was force pushed/updated, we will be fetching the latest version there.” A failed job re-run can silently get different code than the original run. Cache interaction makes it worse: caches only save on successful jobs, so a re-run after a force-push gets different code _and_ has to rebuild the cache. Two sources of non-determinism compounding. A lockfile would make re-runs deterministic: same lockfile, same code, every time. **No dependency tree visibility.** npm has `npm ls`. Cargo has `cargo tree`. You can inspect your full dependency graph, find duplicates, trace how a transitive dependency got pulled in. Actions gives you nothing. You can’t see what your workflow actually depends on without manually reading every composite action’s source. A lockfile would be a complete manifest of your dependency tree. **Undocumented resolution semantics.** Every package manager documents how dependency resolution works. npm has a spec. Cargo has a spec. Actions resolution is undocumented. The runner source is public, and the entire “resolution algorithm” is in ActionManager.cs. Here’s a simplified version of what it does: // Simplified from actions/runner ActionManager.cs async Task PrepareActionsAsync(steps) { // Start fresh every time - no caching DeleteDirectory("_work/_actions"); await PrepareActionsRecursiveAsync(steps, depth: 0); } async Task PrepareActionsRecursiveAsync(actions, depth) { if (depth > 10) throw new Exception("Composite action depth exceeded max depth 10"); foreach (var action in actions) { // Resolution happens on GitHub's server - opaque to us var downloadInfo = await GetDownloadInfoFromGitHub(action.Reference); // Download and extract - no integrity verification var tarball = await Download(downloadInfo.TarballUrl); Extract(tarball, $"_actions/{action.Owner}/{action.Repo}/{downloadInfo.Sha}"); // If composite, recurse into its dependencies var actionYml = Parse($"_actions/{action.Owner}/{action.Repo}/{downloadInfo.Sha}/action.yml"); if (actionYml.Type == "composite") { // These nested actions may use mutable tags - we have no control await PrepareActionsRecursiveAsync(actionYml.Steps, depth + 1); } } } That’s it. No version constraints, no deduplication (the same action referenced twice gets downloaded twice), no integrity checks. The tarball URL comes from GitHub’s API, and you trust them to return the right content for the SHA. A lockfile wouldn’t fix the missing spec, but it would at least give you a concrete record of what resolution produced. Even setting lockfiles aside, Actions has other issues that proper package managers solved long ago. **No registry.** Actions live in git repositories. There’s no central index, no security scanning, no malware detection, no typosquatting prevention. A real registry can flag malicious packages, store immutable copies independent of the source, and provide a single point for security response. The Marketplace exists but it’s a thin layer over repository search. Without a registry, there’s nowhere for immutable metadata to live. If an action’s source repository disappears or gets compromised, there’s no fallback. **Shared mutable environment.** Actions aren’t sandboxed from each other. Two actions calling `setup-node` with different versions mutate the same `$PATH`. The outcome depends on execution order, not any deterministic resolution. **No offline support.** Actions are pulled from GitHub on every run. There’s no offline installation mode, no vendoring mechanism, no way to run without network access. Other package managers let you vendor dependencies or set up private mirrors. With Actions, if GitHub is down, your CI is down. **The namespace is GitHub usernames.** Anyone who creates a GitHub account owns that namespace for actions. Account takeovers and typosquatting are possible. When a popular action maintainer’s account gets compromised, attackers can push malicious code and retag. A lockfile with integrity hashes wouldn’t prevent account takeovers, but it would detect when the code changes unexpectedly. The hash mismatch would fail the build instead of silently running attacker-controlled code. Another option would be something like Go’s checksum database, a transparent log of known-good hashes that catches when the same version suddenly has different contents. ### How Did We Get Here? The Actions runner is forked from Azure DevOps, designed for enterprises with controlled internal task libraries where you trust your pipeline tasks. GitHub bolted a public marketplace onto that foundation without rethinking the trust model. The addition of composite actions and reusable workflows created a dependency system, but the implementation ignored lessons from package management: lockfiles, integrity verification, transitive pinning, dependency visibility. This matters beyond CI/CD. Trusted publishing is being rolled out across package registries: PyPI, npm, RubyGems, and others now let you publish packages directly from GitHub Actions using OIDC tokens instead of long-lived secrets. OIDC removes one class of attacks (stolen credentials) but amplifies another: the supply chain security of these registries now depends entirely on GitHub Actions, a system that lacks the lockfile and integrity controls these registries themselves require. A compromise in your workflow’s action dependencies can lead to malicious packages on registries with better security practices than the system they’re trusting to publish. Other CI systems have done better. GitLab CI added an `integrity` keyword in version 17.9 that lets you specify a SHA256 hash for remote includes. If the hash doesn’t match, the pipeline fails. Their documentation explicitly warns that including remote configs “is similar to pulling a third-party dependency” and recommends pinning to full commit SHAs. GitLab recognized the problem and shipped integrity verification. GitHub closed the feature request. GitHub’s design choices don’t just affect GitHub users. Forgejo Actions maintains compatibility with GitHub Actions, which means projects migrating to Codeberg for ethical reasons inherit the same broken CI architecture. The Forgejo maintainers openly acknowledge the problems, with contributors calling GitHub Actions’ ecosystem “terribly designed and executed.” But they’re stuck maintaining compatibility with it. Codeberg mirrors common actions to reduce GitHub dependency, but the fundamental issues are baked into the model itself. GitHub’s design flaws are spreading to the alternatives. GitHub issue #2195 requested lockfile support. It was closed as “not planned” in 2022. Palo Alto’s “Unpinnable Actions” research documented how even SHA-pinned actions can have unpinnable transitive dependencies. Dependabot can update action versions, which helps. Some teams vendor actions into their own repos. zizmor is excellent at scanning workflows and finding security issues. But these are workarounds for a system that lacks the basics. The fix is a lockfile. Record resolved SHAs for every action reference, including transitives. Add integrity hashes. Make the dependency tree inspectable. GitHub closed the request three years ago and hasn’t revisited it. * * * **Further reading:** * Characterizing the Security of GitHub CI Workflows - Koishybayev et al., USENIX Security 2022 * ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions - Muralee et al., USENIX Security 2023 * New GitHub Action supply chain attack: reviewdog/action-setup - Wiz Research, 2025 * Unpinnable Actions: How Malicious Code Can Sneak into Your GitHub Actions Workflows * GitHub Actions Worm: Compromising GitHub Repositories Through the Actions Dependency Tree * setup-python: Action can be compromised via mutable dependency
nesbitt.io
December 6, 2025 at 1:21 PM