Python, data preparation, ML, tabular learning.
ORCID: 0000-0002-4448-2959
Hoshiyomi ☄️
https://www.riccardocappuzzo.com
https://github.com/rcap107
...
"this will be hard to debug"
...
"this will be hard to debug"
Big thumbs up for the sklearn team & the maintainer of this package
Big thumbs up for the sklearn team & the maintainer of this package
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
www.cloudflare.com/learning/ssl...
www.cloudflare.com/learning/ssl...
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
There is an active phishing attack targeting PyPI users.
• Threat: Emails from noreply@pypj.org (with a 'j') link to a fake login page.
• Action: Do not click any links. If you already did, change your PyPI password ASAP.
• Note: PyPI itself has not been breached.
There is an active phishing attack targeting PyPI users.
• Threat: Emails from noreply@pypj.org (with a 'j') link to a fake login page.
• Action: Do not click any links. If you already did, change your PyPI password ASAP.
• Note: PyPI itself has not been breached.
I really think DataOps are a game changer, and I can't wait to see what people come up with with them.
I also ended up rewriting most of the user guide, hopefully improving it along on the way 😂
🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
I really think DataOps are a game changer, and I can't wait to see what people come up with with them.
I also ended up rewriting most of the user guide, hopefully improving it along on the way 😂
This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
If it wasn't clear, don't do this. If you *really* have to, I used the @obsidian.md canvas for this.
If it wasn't clear, don't do this. If you *really* have to, I used the @obsidian.md canvas for this.
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
Our paper, "Retrieve, Merge, Predict: Augmenting Tables with Data Lakes", has been published in TMLR!
In this work, we created YADL (a semi-synthetic data lake), and we benchmarked methods for augmenting user-provided tables given information found in data lakes.
1/
I had no idea how much of a difference changing fonts and background color could make
I had no idea how much of a difference changing fonts and background color could make
It just blew my mind by autocompleting the dictionary "release_dates" with the correct dates for Muse albums based on the fact I am looking at data about Muse in the script.
wow
It just blew my mind by autocompleting the dictionary "release_dates" with the correct dates for Muse albums based on the fact I am looking at data about Muse in the script.
wow