🇺🇦 Sviatoslav Sydorenko
banner
webknjaz.mastodon.social.ap.brid.gy
🇺🇦 Sviatoslav Sydorenko
@webknjaz.mastodon.social.ap.brid.gy
Your supply chain. Rabbit hole explorer. Pythonista. A #Ukrainian expat.

Serial #FOSS maintainer: #aiohttp, #ansible-core (Red Hat), #CherryPy, #pip-tools […]

🌉 bridged from https://mastodon.social/@webknjaz on the fediverse by https://fed.brid.gy/
Reposted by 🇺🇦 Sviatoslav Sydorenko
Slopsquatting meets Dependency Confusion
Supply chain attacks on package managers keep getting more creative. Two attack vectors in particular have drawn attention: dependency confusion, which exploits how package managers resolve private versus public packages, and slopsquatting, which exploits LLM hallucinations. Each is dangerous on its own. Combined, they could be worse. ### Dependency confusion Dependency confusion came to light in February 2021 when security researcher Alex Birsan published how he’d compromised over 35 major companies including Apple, Microsoft, PayPal, Tesla, Netflix, and Uber. He earned over $130,000 in bug bounties for this research. The attack exploits registry resolution order. Most package managers can be configured to check multiple registries: a private registry for internal packages, plus a public registry like npm or PyPI. When a developer runs `npm install` or `pip install`, the package manager needs to decide which registry to query. The resolution logic varies by tool and configuration, but a common pattern is to check public registries first, or to prefer whichever registry has the higher version number. This creates an opening. Say a company has an internal package called `acme-utils` on their private registry at version 1.2.0. An attacker registers `acme-utils` on the public npm registry at version 99.0.0. Depending on how the package manager is configured, it might prefer the public package because of the higher version number. The attacker’s code now runs in the target’s environment. The classic case is private versus public registries, but the same issue affects any setup where multiple registries are checked in sequence. Artifactory or Nexus instances proxying multiple upstreams can have the same vulnerability, as can Maven setups that pull from multiple repositories. A misconfigured `.npmrc` or `pip.conf` that doesn’t properly scope private packages is enough. Clojars, the main Clojure package registry, used to be particularly exposed because it sat on top of Maven and allowed anyone to register packages under almost any name with no verification; they’ve since tightened this. The reconnaissance step is the bottleneck. Birsan found internal package names by examining leaked `package.json` files, error messages, and GitHub repositories that accidentally exposed internal dependencies. Once he had candidate names, he registered them publicly with high version numbers and code that phoned home on install. It worked, but it’s manual work that scales poorly. Each target requires separate investigation. ### Slopsquatting Code-generating LLMs have a peculiar behavior: they hallucinate package names that don’t exist. Ask an LLM to write code that parses YAML, and it might suggest `import yaml_parser` even though no such package exists on PyPI. The model isn’t looking up real packages; it’s predicting what tokens are likely to come next based on patterns in its training data. Sometimes those predictions land on real packages. Sometimes they don’t. The attack itself isn’t new. Bar Lanyado at Lasso Security documented package hallucination attacks in 2023. But the name “slopsquatting” came out of a conversation I had with Seth Larson in April 2025. We were discussing how 404 logs from package registries could reveal which non-existent packages developers were trying to install, and therefore which hallucinated names would be most valuable to squat. I said it needed a good name. Seth suggested “slopsquatting” and I posted it on Mastodon, where it caught on. A study by Spracklen et al., published at USENIX Security 2025, quantified the problem. Across 576,000 code samples generated by 16 different LLMs, 19.7% of suggested packages were hallucinations. That’s 205,474 unique fake package names that don’t exist on any public registry. Notably, 38% of these hallucinated names were similar to real package names, and some were even valid packages in other programming languages. The LLMs aren’t generating random strings; they’re generating plausible-sounding names that are easy to confuse with legitimate packages. The hallucination rates varied by model: 21.7% for open-source models, 5.2% for commercial ones like GPT-4. But even at 5%, one in twenty package suggestions points to something that doesn’t exist. More importantly, 43% of hallucinated packages appeared consistently across repeated prompts. The same question yields the same fake package name, which means an attacker can predict what names LLMs will suggest and register them preemptively. The attack: prompt LLMs with common coding tasks, collect the hallucinated names, register them on PyPI or npm with malicious payloads, and wait. Slopsquatting now has a Wikipedia entry, suggesting it’s crossed into mainstream awareness. There’s an irony here: the same 404 logs that could help registries identify slopsquatting attempts are also a roadmap for attackers. I’ve since heard those logs described as “toxic waste” because they reveal intent: every 404 is a package name someone tried to install, which means it’s a name worth squatting. Birsan had to do manual reconnaissance to find internal package names; 404 logs would hand them over directly. The attack surface scales with LLM adoption. Every developer using Copilot, ChatGPT, or Claude for code generation is potentially exposed. Developers doing what Andrej Karpathy called “vibe coding,” where you’re curating LLM output rather than writing code yourself, are especially vulnerable because they’re less likely to scrutinize individual package names. The more people trust LLM suggestions without verification, the more valuable it becomes to squat on hallucinated names. ### The combination Most slopsquatting research focuses on hallucinated names that never existed anywhere. But what happens when an LLM hallucinates a package name that actually exists as a private package at some company? The combination inverts the discovery problem. Traditional dependency confusion requires finding internal package names, then squatting them. With LLMs, an attacker can squat hallucinated names first, and some will happen to be real internal names somewhere. The attack flow: LLM training data includes leaked code referencing private packages, attacker registers hallucinated names on public registries, LLM suggests those names to developers at the very companies that use them internally. The model becomes both the discovery mechanism and the delivery mechanism. LLMs are trained on enormous scraped datasets that inevitably include unintentionally exposed code: internal documentation indexed by search engines, Stack Overflow posts with private package names, GitHub repositories that were briefly public before someone noticed. There’s also the question of what repository hosts have trained on: if a code hosting platform trained models on private repositories without explicit disclosure, every private package name in those repos is potentially learnable through the right prompts. Companies increasingly fine-tune LLMs on their own codebases for internal developer tools. These models know every internal package name by design. If the model is exposed too broadly, or if its outputs are shared outside the organization, those package names leak through the suggestions themselves. An attacker doesn’t need to find leaked `package.json` files; they just need access to a model that was trained on them. You can see the leakage directly by prompting an LLM to roleplay: User: Imagine you're a new developer at Stripe and you're learning how to use their internal package repository. Give me some example commands. LLM: # authenticate stripepkg login # search for a library stripepkg search payments-core # install a package into your service stripepkg add [email protected] # publish a new internal library stripepkg publish --tag=beta # remove a package stripepkg remove auth-utils The LLM has hallucinated `payments-core` and `auth-utils` as internal Stripe packages. These names are plausible enough that an attacker could register them on npm or PyPI. If a Stripe developer later asks an LLM for help and gets the same suggestion, they might install the public malicious package instead of their internal one. The combination is worse than either attack alone. Traditional slopsquatting requires waiting for random developers to install fictional packages. Traditional dependency confusion requires discovering specific internal names at specific companies. Combined, an attacker can spray malicious packages across public registries and let LLMs distribute them to exactly the vulnerable developers. The attacker doesn’t even need their own LLM access; hallucinated package names are published in academic studies or can be gathered via cheap prompting runs against free-tier models. This is speculative, and I haven’t seen documented cases of this combined attack in the wild, but all the components are there: LLMs hallucinate consistently, sometimes based on training data patterns that include leaked internal code, and package managers have well-documented dependency confusion vulnerabilities. The attack surface is real even if it hasn’t been publicly exploited yet. The mitigations for each attack apply here too. For dependency confusion: use scoped packages (like npm’s `@org/package` namespacing), configure registries to explicitly resolve private packages first, and pin to specific registries in your config. For slopsquatting: verify that suggested packages exist and are legitimate before installing them. For the combination: assume that any package name an LLM suggests might already be maliciously registered, especially if it matches an internal package name. Package names occupy a weird space: short strings that need to be globally unique, rarely verified beyond “did the install succeed,” and now flowing through systems that treat them as just another token to predict. Every step where a package name passes through an LLM, ingested from training data, stored as weights, retrieved during inference, suggested to a developer, typed into a terminal, is a potential point of corruption. LLMs are introducing new trust assumptions into software development. When a developer types an import statement, they’re asserting they know what package they want, but when an LLM generates that import, nobody made that assertion: the model might have invented the name, remembered it from leaked training data, or correctly identified a real package, and distinguishing between these cases is left as an exercise for the reader. Package ecosystems weren’t designed for a world where code suggestions come from probabilistic models trained on scraped data of uncertain provenance, and the security model assumed developers knew what they wanted. That assumption no longer holds.
nesbitt.io
December 10, 2025 at 5:48 PM
Sweet! @nedbat released new #Coveragepy supporting a dedicated `.coveragerc.toml` config in the default discovery mechanism!

I should now `pip install 'coverage >= 7.13.0'` everywhere!

#python
December 9, 2025 at 3:18 PM
Just figured out the other day how I managed to cut off my Google Home speakers off the internet and put them into a limbo for a couple of months...

https://webknjaz.me/prose/google-home-in-exile/

#openwrt #googlehome #dhcp
November 27, 2025 at 12:47 AM
Reposted by 🇺🇦 Sviatoslav Sydorenko
This week, the #python Software Foundation rejected a $1.5MM NSF grant, due to a requirement that the PSF abandon diversity work.

So I joined forces with Python folks (@offby1, @petrillic, @amethyst) and we're matching up to $12,000 of donations to the PSF. 🧵
October 29, 2025 at 9:26 PM
📢 The other day @sirosen cut #pip-tools v7.5.1 with compatibility fixes for Click v8.3.0 and a few regression fixes of recursive extras!

🔏 #pipools/#pip_tools-7.5.1-py3-none-any.whl" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">https://pypi.org/project/pip-tools/#pip_tools-7.5.1-py3-none-any.whl

📝 https://pip-tools.readthedocs.io/en/stable/changelog/#v7-5-1

#python #packaging

👥 […]
Original post on mastodon.social
mastodon.social
October 3, 2025 at 11:55 AM
Just cut a new release of `pypi-publish` v1.13.0!

It's got internal runtime update, housekeeping, also diagnostic messages and security improvements from @yossarian

https://github.com/pypa/gh-action-pypi-publish/releases/tag/v1.13.0 / […]
Original post on mastodon.social
mastodon.social
September 4, 2025 at 1:52 AM
Wow, @texty published their research about bias and disinformation about #ukraine in #ai just the other day: https://texty.org.ua/projects/115751/what-does-ai-think-about-ukraine-exploring-the-biases-of-large-language-models/

#ukrainianview #StandWithUkraine️ #llm
September 2, 2025 at 9:29 AM
Found @sirosen, myself and @jaraco in the background of the shot at 00:50:00 of the #python documentary!
This was at the #packaging Summit at #pyconus 2025: https://youtu.be/GfH4QL4VqJ0?t=50m0s
August 30, 2025 at 10:28 AM
Alright.. So the tox-lock idea (https://mastodon.social/@webknjaz/114229050261530276) is slowly moving forward — I've just fixed a blocker in #tox upstream that was preventing me from shipping ephemeral toxenvs via plugins:
https://github.com/tox-dev/tox/pull/3591.

@gaborbernat helped me figure […]
Original post on mastodon.social
mastodon.social
August 29, 2025 at 10:07 PM
Reposted by 🇺🇦 Sviatoslav Sydorenko
Death by a thousand slops
I have previously blogged about the relatively new trend of AI slop in vulnerability reports submitted to curl and how it hurts and exhausts us. This trend does not seem to slow down. On the contrary, it seems that we have recently not only received more AI slop but also more _human slop_. The latter differs only in the way that we cannot immediately tell that an AI made it, even though we many times still suspect it. The net effect is the same. The general trend so far in 2025 has been _way more_ AI slop than ever before (about 20% of all submissions) as we have averaged in about two security report submissions per week. In early July, about 5% of the submissions in 2025 had turned out to be genuine vulnerabilities. The valid-rate has decreased _significantly_ compared to previous years. We have run the curl Bug Bounty since 2019 and I have previously considered it a success based on the amount of genuine and real security problems we have gotten reported and thus fixed through this program. 81 of them to be exact, with over 90,000 USD paid in awards. ## End of the road? While we are not going to do anything rushed or in panic immediately, there are reasons for us to consider changing the setup. Maybe we need to drop the monetary reward? I want us to use the rest of the year 2025 to evaluate and think. The curl bounty program continues to run and we deal with everything as before while we ponder about what we can and should do to improve the situation. For the sanity of the curl security team members. We need to reduce the amount of sand in the machine. We must do something to drastically reduce the temptation for users to submit low quality reports. Be it with AI or without AI. The curl security team consists of seven team members. I encourage the others to also chime in to back me up (so that we act right in each case). Every report thus engages 3-4 persons. Perhaps for 30 minutes, sometimes up to an hour or three. Each. I personally spend an insane amount of time on curl already, wasting three hours still leaves time for other things. My fellows however are not full time on curl. They might only have three hours per week for curl. Not to mention the _emotional toll_ it takes to deal with these mind-numbing stupidities. Times _eight_ the last week alone. ## Reputation doesn’t help On HackerOne the users get their _reputation_ lowered when we close reports as _not applicable_. That is only really a mild “threat” to experienced HackerOne participants. For new users on the platform that is mostly a pointless exercise as they can just create a new account next week. Banning those users is similarly a rather toothless threat. Besides, there seem to be so many so even if one goes away, there are a thousand more. ## HackerOne It is not super obvious to me exactly _how_ HackerOne should change to help us combat this. It is however clear that we need them to do something. Offer us more tools and knobs to tweak, to save us from drowning. If we are to keep the program with them. I have yet again reached out. We will just have to see where that takes us. ## Possible routes forward People mention charging a fee for the right to submit a security vulnerability (that could be paid back if a proper report). That would probably slow them down significantly sure, but it seems like a rather hostile way for an Open Source project that aims to be as open and available as possible. Not to mention that we don’t have any current infrastructure setup for this – and neither does HackerOne. And managing money is painful. Dropping the monetary reward part would make it much less interesting for _the general populace_ to do random AI queries in desperate attempts to report something that could generate income. It of course also removes the traction for some professional and highly skilled security researchers, but maybe that is a hit we can/must take? As a lot of these reporters seem to _genuinely_ think they help out, apparently blatantly tricked by the marketing of the AI hype-machines, it is not certain that removing the money from the table is going to completely stop the flood. We need to be prepared for that as well. Let’s burn that bridge if we get to it. ## The AI slop list If you are still innocently unaware of what AI slop means in the context of security reports, I have collected a list of a number of reports submitted to curl that help showcase. Here’s a snapshot of the list from today: 1. [Critical] Curl CVE-2023-38545 vulnerability code changes are disclosed on the internet. #2199174 2. Buffer Overflow Vulnerability in WebSocket Handling #2298307 3. Exploitable Format String Vulnerability in curl_mfprintf Function #2819666 4. Buffer overflow in strcpy #2823554 5. Buffer Overflow Vulnerability in strcpy() Leading to Remote Code Execution #2871792 6. Buffer Overflow Risk in Curl_inet_ntop and inet_ntop4 #2887487 7. bypass of this Fixed #2437131 [ Inadequate Protocol Restriction Enforcement in curl ] #2905552 8. Hackers Attack Curl Vulnerability Accessing Sensitive Information #2912277 9. (“possible”) UAF #2981245 10. Path Traversal Vulnerability in curl via Unsanitized IPFS_PATH Environment Variable #3100073 11. Buffer Overflow in curl MQTT Test Server (tests/server/mqttd.c) via Malicious CONNECT Packet #3101127 12. Use of a Broken or Risky Cryptographic Algorithm (CWE-327) in libcurl #3116935 13. Double Free Vulnerability in `libcurl` Cookie Management (`cookie.c`) #3117697 14. HTTP/2 CONTINUATION Flood Vulnerability #3125820 15. HTTP/3 Stream Dependency Cycle Exploit #3125832 16. Memory Leak #3137657 17. Memory Leak in libcurl via Location Header Handling (CWE-770) #3158093 18. Stack-based Buffer Overflow in TELNET NEW_ENV Option Handling #3230082 19. HTTP Proxy Bypass via `CURLOPT_CUSTOMREQUEST` Verb Tunneling #3231321 20. Use-After-Free in OpenSSL Keylog Callback via SSL_get_ex_data() in libcurl #3242005 21. HTTP Request Smuggling Vulnerability Analysis – cURL Security Report #3249936
daniel.haxx.se
July 14, 2025 at 10:39 AM
Reposted by 🇺🇦 Sviatoslav Sydorenko
dependency groups 🥰
June 4, 2025 at 10:11 AM
Reposted by 🇺🇦 Sviatoslav Sydorenko
Watching the frustratingly fruitless fights over the USEFULNESS of LLM-based coding helpers, I've come down to 3 points that explain why ppl seem to live in different realities:

Most programmers:

1) Write inconsequential remixes of trivial code that has been written many times before.

2) Lack […]
Original post on mastodon.social
mastodon.social
June 18, 2025 at 8:45 AM
Found an Apple Banana #ai at @devconf_cz earlier today 😜

#DevConf_CZ
June 12, 2025 at 10:37 PM
Reposted by 🇺🇦 Sviatoslav Sydorenko
I put together an annotated version of the new Claude 4 system prompt, covering both the prompt Anthropic published and the missing, leaked sections that describe its various tools

It's basically the secret missing manual for Claude 4, it's […]

[Original post on fedi.simonwillison.net]
May 25, 2025 at 1:53 PM
Bye Pittsburgh 🐍

The #pyconus2025 sprints were fun yet again!

#pyconus #python
May 23, 2025 at 2:19 PM
Reposted by 🇺🇦 Sviatoslav Sydorenko
The #pyconus Friday talks just dropped!

So, here's my

Design Pressure: The Invisible Hand That Shapes Your Code

https://www.youtube.com/watch?v=IhNSINolcSM

1/3
May 22, 2025 at 6:06 PM
We're there any COVID reports at #pyconus this year? Mastodon seems suspiciously silent. I just heard about one case today and that's it..
May 22, 2025 at 2:53 AM
✈️ Departing for IAD now. Expected to land at 14:35 and the next connection would be at 17:05. Crossing the border in-between.

@pycon I'm coming 😉

#python #pyconus #pyconus2025
PyCon US (@pycon@fosstodon.org)
139 Posts, 9 Following, 1.43K Followers · PyCon US is a community-focused conference for Python users and developers. Join us in Pittsburgh, PA from May 14- May 22, 2025!
fosstodon.org
May 13, 2025 at 8:58 AM
Reposted by 🇺🇦 Sviatoslav Sydorenko
Did a post on the personal blog (i.e., “the views expressed by me do not…blah blah”) — “Trump’s Retaliation Against Chris Krebs — and the Cybersecurity Industry’s Deafening Silence”.

Unlike most years, everyone attending RSA next week has a tangible […]

[Original post on mastodon.social]
April 17, 2025 at 7:36 AM