Aécio Santos
aeciosan.bsky.social
Aécio Santos
@aeciosan.bsky.social
Research Engineer at New York University. Interested in dataset search & discovery, sketching, data management, nlp, and information retrieval.
Our new paper "Magneto: Combining Small and Large Language Models for Schema Matching" has just been published in the new issue of #PVLDB! The paper introduces a new framework that combines both small and large language models for effective schema matching.

www.vldb.org/pvldb/vol18/...
August 7, 2025 at 6:03 AM
📢 Tomorrow, I'll be presenting our new paper on LLM-based agents for interactive data integration at the #SIGMOD2025 NOVAS workshop. I'll also be in Berlin for the whole week, so please reach out if you'd like to chat or hang out!

Paper: arxiv.org/abs/2502.07132
Interactive Data Harmonization with LLM Agents
Data harmonization is an essential task that entails integrating datasets from diverse sources. Despite years of research in this area, it remains a time-consuming and challenging task due to schema m...
arxiv.org
June 21, 2025 at 3:53 PM
Reposted by Aécio Santos
Transformer-based neural networks achieve impressive performance on coding, math & reasoning tasks that require keeping track of variables and their values. But how can they do that without explicit memory?

📄 Our new ICML paper investigates this in a synthetic setting!
🎥 youtu.be/Ux8iNcXNEhw
🧵 1/13
How Do Transformers Learn Variable Binding in Symbolic Programs?
YouTube video by Raphaël Millière
youtu.be
June 3, 2025 at 1:19 PM
Reposted by Aécio Santos
The Data Management for End-to-End Machine Learning workshop (@deem-workshop.bsky.social) will be back at #SIGMOD2025! ✨

🔗 Check out the CfP: deem-workshop.github.io
📝 Submission deadline: March 21
📢 Notifications: April 25

Join us for the 9th edition in Berlin!

#DEEM2025
DEEM - The 9th Workshop on End-to-End Data Management is also co-located with SIGMOD/PODS 2025. The deadline for papers is March 21st. For more details checkout the website
deem-workshop.github.io
DEEM: Workshop on Data Management for End-to-End Machine Learning @ ACM SIGMOD 2024
deem-workshop.github.io
February 7, 2025 at 8:58 PM
Reposted by Aécio Santos
Slides for "Table Foundation Models"

I explain why these models can strongly outperform tree-based models, what are the intuitions,
hopefully pointing to ways forward for more improvement

speakerdeck.com/gaelvaroquau...
Table foundation models for analytics
Deep-learning typically does not outperform tree-based models on tabular data. Often this may be explained by the small size of such datasets. For image…
speakerdeck.com
December 15, 2024 at 10:43 PM
@madelonhulsebos.bsky.social kicking off the 3rd Table Representation Learning workshop (@trl-research.bsky.social) at NeurIPS 2024. First keynote by @gaelvaroquaux.bsky.social.
December 14, 2024 at 4:56 PM