Thibaud Colas
banner
thibaudcolas.bsky.social
Thibaud Colas
@thibaudcolas.bsky.social
President @djangoproject.com, core team @wagtail.org, building things @torchbox.com. Accessibility, AI, climate action w/ climateaction.tech
TL;DR; rewrite it in #Rust! 😛 Just kidding, there are lots of things we can do as #Django / #Python tech people to build leaner. More nuance coming up in the full podcast
November 11, 2025 at 2:24 PM
This is just from the books3 dataset (included in The Pile). obvs goes well beyond this dataset, it’s just unique because we can name exactly which torrented books were part of training data [3/3]
October 14, 2025 at 10:41 AM
And more!

- Beginning django CMS
- Django By Example
- Django Design Patterns and Best Practices
- Django: Web Development with Python
- Learning Django Web Development
- Mastering Django: Core
- Pro Django

[2/3]
October 14, 2025 at 10:41 AM
Another option I’ve seen occasionally is "ask people to run your analysis script on their projects and share results". Yet another type of bias in the data so not suitable for much of anything. Anyway, I think "top 8000 PyPI" is a pretty good pick for this specific analysis! ⭐️ ty for the context
September 24, 2025 at 3:39 PM
re corporate codebases, I guess the simplest I’d have thought is open source code on GitHub? Select repos on GitHub based on number of stars or activity levels. I have a small dataset of @wagtail.org projects for that reason.
September 24, 2025 at 3:39 PM
ty! yeah my assumption is packages probs have more scrutiny on avg than apps, so probs if your analysis shows a clear enough problem with top packages, it’d only be worse with other packages, and with apps? No hard data to validate my assumption but seems intuitive enough.
September 24, 2025 at 3:34 PM
I really liked your talk, hope it’s ok to ask a bonus question! I wanted to ask how/why you selected the code of the "top 8000 PyPI packages" as a dataset to analyze? Why PyPI packages and why top 8000? Feels valid but skewed towards packages, and specifically ones that have more scrutiny than avg
September 24, 2025 at 12:05 PM
one of the questions Jake got started with "this isn’t a question more of a comment" 💯💥 someone knows what they’re doing
September 24, 2025 at 12:04 PM
Ticket #36389 gets a shout-out!
September 24, 2025 at 12:04 PM