Sebastian
@mersault.bsky.social
Professor at BIFOLD & TU Berlin, research on data engineering for ML. Previously at UvA, NYU, Amazon, Twitter. Opinions are my own.
https://deem.berlin
https://deem.berlin
Reposted by Sebastian
What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
Big thumbs up for the sklearn team & the maintainer of this package
October 1, 2025 at 8:24 AM
What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
Big thumbs up for the sklearn team & the maintainer of this package
Reposted by Sebastian
I don't know what to say. You dream about it for so long and then when it finally happens you're in shock. I'm so proud of you Larry. www.theguardian.com/technology/2...
Larry Ellison overtakes Elon Musk as world’s richest person
Oracle co-founder’s shares rose by 40% in early trading, valuing his fortune at $393bn, just ahead of Musk’s $384bn
www.theguardian.com
September 10, 2025 at 7:52 PM
I don't know what to say. You dream about it for so long and then when it finally happens you're in shock. I'm so proud of you Larry. www.theguardian.com/technology/2...
Reposted by Sebastian
It looks like a date frame, but Skrub stores the whole transformation pipeline in the magic skb attribute!
September 2, 2025 at 7:10 PM
It looks like a date frame, but Skrub stores the whole transformation pipeline in the magic skb attribute!
Reposted by Sebastian
🔥CAN YOU BUILD AI MODELS that give you (verifiable) uncertainty estimates in their outputs? Cool talk on ML, classifiers, + calibration www.youtube.com/watch?v=SI6b... by scikit-learn architect @gaelvaroquaux.bsky.social
*with ninja-level modeling of variance you probably didn't know existed !
*with ninja-level modeling of variance you probably didn't know existed !
September 1, 2025 at 12:08 AM
🔥CAN YOU BUILD AI MODELS that give you (verifiable) uncertainty estimates in their outputs? Cool talk on ML, classifiers, + calibration www.youtube.com/watch?v=SI6b... by scikit-learn architect @gaelvaroquaux.bsky.social
*with ninja-level modeling of variance you probably didn't know existed !
*with ninja-level modeling of variance you probably didn't know existed !
Reposted by Sebastian
✨ Excited to present our workshop paper at DataWorld at #ICML2025 tomorrow 🇨🇦
We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.
Visit our poster:
📅 Saturday, July 19, 10:05 AM - 11:20 AM
📍 West Meeting Room 208-209
We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.
Visit our poster:
📅 Saturday, July 19, 10:05 AM - 11:20 AM
📍 West Meeting Room 208-209
July 18, 2025 at 5:24 PM
✨ Excited to present our workshop paper at DataWorld at #ICML2025 tomorrow 🇨🇦
We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.
Visit our poster:
📅 Saturday, July 19, 10:05 AM - 11:20 AM
📍 West Meeting Room 208-209
We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.
Visit our poster:
📅 Saturday, July 19, 10:05 AM - 11:20 AM
📍 West Meeting Room 208-209
Reposted by Sebastian
Thanks to my supervisor
@mersault.bsky.social!
Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...
@mersault.bsky.social!
Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...
openreview.net
July 18, 2025 at 5:24 PM
Thanks to my supervisor
@mersault.bsky.social!
Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...
@mersault.bsky.social!
Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...
The DEEM Lab is at ICML this week for the first time, with two contributions!
(1/3)
(1/3)
July 14, 2025 at 6:10 AM
The DEEM Lab is at ICML this week for the first time, with two contributions!
(1/3)
(1/3)
Reposted by Sebastian
Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! 🥳
Thanks to @mersault.bsky.social!
Thanks to @mersault.bsky.social!
June 10, 2025 at 2:10 PM
Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! 🥳
Thanks to @mersault.bsky.social!
Thanks to @mersault.bsky.social!
Reposted by Sebastian
Our demo "mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'" was accepted at VLDB! 🥳
We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions
youtu.be/ePGm1J6S2qk
Joint work w/ @mersault.bsky.social @p-groth.bsky.social
We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions
youtu.be/ePGm1J6S2qk
Joint work w/ @mersault.bsky.social @p-groth.bsky.social
May 30, 2025 at 7:09 PM
Our demo "mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'" was accepted at VLDB! 🥳
We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions
youtu.be/ePGm1J6S2qk
Joint work w/ @mersault.bsky.social @p-groth.bsky.social
We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions
youtu.be/ePGm1J6S2qk
Joint work w/ @mersault.bsky.social @p-groth.bsky.social
Reposted by Sebastian
📢 We are hosting a DuckDB meetup in Berlin during the week of the SIGMOD conference.
📍 The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!
📝 If you plan to attend, please register at duckdb.org/events/2025/...
📍 The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!
📝 If you plan to attend, please register at duckdb.org/events/2025/...
May 28, 2025 at 3:15 PM
📢 We are hosting a DuckDB meetup in Berlin during the week of the SIGMOD conference.
📍 The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!
📝 If you plan to attend, please register at duckdb.org/events/2025/...
📍 The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!
📝 If you plan to attend, please register at duckdb.org/events/2025/...
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines that optimize ML models along responsibility objectives.
This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.
Details: deem.berlin#jobs-17725
This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.
Details: deem.berlin#jobs-17725
May 26, 2025 at 4:05 AM
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines that optimize ML models along responsibility objectives.
This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.
Details: deem.berlin#jobs-17725
This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.
Details: deem.berlin#jobs-17725
Reposted by Sebastian
One more week to apply to this exciting position... and another position on #CausalRepresentationLearning and #ReinforcementLearning for learning provably correct #concepts from raw data opening up soon!
Exciting new PhD position at Utrecht University on the #causal effects of communication in #multi-agent #RL with Shihan Wang, Mehdi Dastani and me 🎉
This is part of www.hybrid-intelligence-centre.nl, which aims at combining human and machine intelligence.
Deadline 20 May
www.uu.nl/en/organisat...
This is part of www.hybrid-intelligence-centre.nl, which aims at combining human and machine intelligence.
Deadline 20 May
www.uu.nl/en/organisat...
PhD Position on Causal Effects of Communication in MARL
Join us to work on the causal effects of communication in multi-agent reinforcement learning (MARL).
www.uu.nl
May 12, 2025 at 9:21 AM
One more week to apply to this exciting position... and another position on #CausalRepresentationLearning and #ReinforcementLearning for learning provably correct #concepts from raw data opening up soon!
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.
This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-2225
This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-2225
May 12, 2025 at 3:33 AM
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.
This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-2225
This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-2225
Reposted by Sebastian
📢 Our extended benchmark on self-supervised learning for single-cell data, scSSL-Bench 🧬, is now accepted at ICML (spotlight)!
Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
📢 Our benchmark on self-supervised learning for single-cell data🧬 is accepted at the #NeurIPS2024 SSL workshop. We take a first step towards establishing best practices for SSL methods for single-cell data, and benchmark 8 SSL methods on 3 downstream tasks across 8 datasets.
May 1, 2025 at 10:34 PM
📢 Our extended benchmark on self-supervised learning for single-cell data, scSSL-Bench 🧬, is now accepted at ICML (spotlight)!
Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
I was invited to review for the "Journal of Pipeline Systems Engineering and Practice", seems our work on ML pipelines is finally recognised by other communities as well ;D
April 19, 2025 at 9:19 AM
I was invited to review for the "Journal of Pipeline Systems Engineering and Practice", seems our work on ML pipelines is finally recognised by other communities as well ;D
@recsys.bsky.social Quick question, is the full list of accepted workshops already published somewhere? I am looking for a target venue for the early work of a student of mine. Thx.
April 8, 2025 at 8:09 AM
@recsys.bsky.social Quick question, is the full list of accepted workshops already published somewhere? I am looking for a target venue for the early work of a student of mine. Thx.
We have openings for student assistants in the DEEM Lab at @bifold.berlin. This is a great opportunity to work with PhD students, implement cool stuff, gather research experience and become a co-author of scientific publications :)
deem.berlin#jobs-193487
deem.berlin#jobs-193487
April 8, 2025 at 7:18 AM
We have openings for student assistants in the DEEM Lab at @bifold.berlin. This is a great opportunity to work with PhD students, implement cool stuff, gather research experience and become a co-author of scientific publications :)
deem.berlin#jobs-193487
deem.berlin#jobs-193487
Reposted by Sebastian
Our vision "Towards Regaining Control over Messy ML Pipelines" was accepted for the DAIS workshop at ICDE! 🥳
Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.
Joint work w/ @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social
Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.
Joint work w/ @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social
March 7, 2025 at 1:56 PM
Our vision "Towards Regaining Control over Messy ML Pipelines" was accepted for the DAIS workshop at ICDE! 🥳
Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.
Joint work w/ @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social
Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.
Joint work w/ @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social
Reposted by Sebastian
The Data Management for End-to-End Machine Learning workshop (@deem-workshop.bsky.social) will be back at #SIGMOD2025! ✨
🔗 Check out the CfP: deem-workshop.github.io
📝 Submission deadline: March 21
📢 Notifications: April 25
Join us for the 9th edition in Berlin!
#DEEM2025
🔗 Check out the CfP: deem-workshop.github.io
📝 Submission deadline: March 21
📢 Notifications: April 25
Join us for the 9th edition in Berlin!
#DEEM2025
DEEM - The 9th Workshop on End-to-End Data Management is also co-located with SIGMOD/PODS 2025. The deadline for papers is March 21st. For more details checkout the website
deem-workshop.github.io
deem-workshop.github.io
DEEM: Workshop on Data Management for End-to-End Machine Learning @ ACM SIGMOD 2024
deem-workshop.github.io
February 7, 2025 at 8:58 PM
The Data Management for End-to-End Machine Learning workshop (@deem-workshop.bsky.social) will be back at #SIGMOD2025! ✨
🔗 Check out the CfP: deem-workshop.github.io
📝 Submission deadline: March 21
📢 Notifications: April 25
Join us for the 9th edition in Berlin!
#DEEM2025
🔗 Check out the CfP: deem-workshop.github.io
📝 Submission deadline: March 21
📢 Notifications: April 25
Join us for the 9th edition in Berlin!
#DEEM2025
We have a **Postdoc opening** in Berlin on Responsible Data Engineering!
This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-57624
This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-57624
February 5, 2025 at 8:31 AM
We have a **Postdoc opening** in Berlin on Responsible Data Engineering!
This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-57624
This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-57624
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on efficiently creating, maintaining and evaluating datasets and pipelines for ML use cases.
This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .
deem.berlin#jobs-2225
This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .
deem.berlin#jobs-2225
February 3, 2025 at 8:08 AM
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on efficiently creating, maintaining and evaluating datasets and pipelines for ML use cases.
This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .
deem.berlin#jobs-2225
This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .
deem.berlin#jobs-2225
Reposted by Sebastian
✨ Work for the DEEM Lab supervised by Sebastian Schelter and the Data Integration und Data Preparation supervised by Ziawasch Abedjan
- Salary Level E11
- Apply until: February 14, 2025
www.jobs.tu-berlin.de/en/job-posti...
@ziamurai.bsky.social @mersault.bsky.social
- Salary Level E11
- Apply until: February 14, 2025
www.jobs.tu-berlin.de/en/job-posti...
@ziamurai.bsky.social @mersault.bsky.social
January 29, 2025 at 1:24 PM
✨ Work for the DEEM Lab supervised by Sebastian Schelter and the Data Integration und Data Preparation supervised by Ziawasch Abedjan
- Salary Level E11
- Apply until: February 14, 2025
www.jobs.tu-berlin.de/en/job-posti...
@ziamurai.bsky.social @mersault.bsky.social
- Salary Level E11
- Apply until: February 14, 2025
www.jobs.tu-berlin.de/en/job-posti...
@ziamurai.bsky.social @mersault.bsky.social