Python, data preparation, ML, tabular learning.
ORCID: 0000-0002-4448-2959
Hoshiyomi ☄️
https://www.riccardocappuzzo.com
https://github.com/rcap107
I haven't looked into marimo's features, so maybe I'm missing out on things I can't do from VSCode.
I haven't looked into marimo's features, so maybe I'm missing out on things I can't do from VSCode.
Also people laughed at the memes which is the most important thing, obviously
You can find it here, if you want to check it out 👀
www.youtube.com/watch?v=k9MN...
Also people laughed at the memes which is the most important thing, obviously
...
"this will be hard to debug"
...
"this will be hard to debug"
Big thumbs up for the sklearn team & the maintainer of this package
Big thumbs up for the sklearn team & the maintainer of this package
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
www.cloudflare.com/learning/ssl...
www.cloudflare.com/learning/ssl...
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
There is an active phishing attack targeting PyPI users.
• Threat: Emails from noreply@pypj.org (with a 'j') link to a fake login page.
• Action: Do not click any links. If you already did, change your PyPI password ASAP.
• Note: PyPI itself has not been breached.
There is an active phishing attack targeting PyPI users.
• Threat: Emails from noreply@pypj.org (with a 'j') link to a fake login page.
• Action: Do not click any links. If you already did, change your PyPI password ASAP.
• Note: PyPI itself has not been breached.
I really think DataOps are a game changer, and I can't wait to see what people come up with with them.
I also ended up rewriting most of the user guide, hopefully improving it along on the way 😂
🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
I really think DataOps are a game changer, and I can't wait to see what people come up with with them.
I also ended up rewriting most of the user guide, hopefully improving it along on the way 😂
This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
If it wasn't clear, don't do this. If you *really* have to, I used the @obsidian.md canvas for this.
If it wasn't clear, don't do this. If you *really* have to, I used the @obsidian.md canvas for this.
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/