ewuuu
banner
eugenewu.net
ewuuu
@eugenewu.net
eugenewu.net
I agree as a unit of progress. But it is still integral as a mechanism to learn how to organize thoughts and arguments, and communicate effectively.
October 3, 2025 at 7:25 PM
6/ Got takes?
📩 DM us or email ewu@cs.columbia.edu/raulcf@uchicago.edu

#SIGMOD2025 #VLDB2025 #databases #AI #academicresearch #CSResearch
July 7, 2025 at 2:25 PM
5/ 📄 Full writeup: wp.sigmod.org?p=3828

🎤 VLDB 2025 sequel panel is coming—with new voices and deeper focus on trade-offs: Shreya Shankar, Gustavo Alonso, Natacha Crooks, Jiannan Wang & Divesh Srivastava.
Panel Summary: Where Does Academic Database Research Go From Here? – ACM SIGMOD Blog
wp.sigmod.org
July 7, 2025 at 2:25 PM
4/ Panelists’ spicy ideas: 🌶️

• 8 page SIGMOD papers + a YouTube explainer
• More industry internships to understand "the real world"
• Host “Data + X” workshops in other fields—NOT at SIGMOD
July 7, 2025 at 2:25 PM
3/ Academia enables “research as play.”
• But how do we balance that with:
• Real funding constraints
• Recruiting top students
• But how to compete with 600-person industry teams? By picking the right problems
July 7, 2025 at 2:25 PM
2/ DB folks still have a massive edge:

• Declarative thinking
• Separation of logical + physical
• Deep systems chops
• But we’re not always in the rooms where AI decisions are made. That needs to change.
July 7, 2025 at 2:25 PM
Last but not least, the first workshop on data management, process mining, and LLMs! The founder of process mining, Wil van der Aals, as well as Ernesto Damiani, gave exciting keynotes. dbpmworkshop.github.io
Databases x Process Mining Workshop at SIGMOD 2025
dbpmworkshop.github.io
July 2, 2025 at 6:49 PM
Eugene Wu
eugenewu.net
July 2, 2025 at 2:31 PM
Along with Raul Castro Fernandez, I also co-organized a lively SIGMOD panel exploring the future of academic database research.
July 2, 2025 at 2:31 PM
I was delighted to deliver a keynote "A Decade of Systems for Human Data Interaction" at HILDA Workshop. I argue that HILDA is about providing guarantees for **user-facing experiences** -- critical to support the rise of vibe coding and LLMs that are eating UX.
July 2, 2025 at 2:31 PM
Physical Visualization Design by Yiru Chen et al., lets interface designers focus purely on the visual experience and user interaction of visualization interfaces. PVD automatically assesses if a design can meet interactivity goals synthesizes the client-server architecture to deploy it.
July 2, 2025 at 2:31 PM
Haneen had a strong presence at ProvWeek: high-performance lineage compression and an in-depth comparison of capture methods. She demo'd FADE, which runs >1M what-if qs/sec in DuckDB to power interactive what-if analytics. It's a DuckDB extension! github.com/haneensa/lin...
GitHub - haneensa/lineage: row-level lineage
row-level lineage. Contribute to haneensa/lineage development by creating an account on GitHub.
github.com
July 2, 2025 at 2:31 PM
Highlights:

John and Luis received the 10-year Test-of-Time Award for their 2015 paper k-shape

Lampros, myself, & friends at Celonis won DEEM Best Paper for Hierarchical Text Segmentation using LLMs & Best Industry Demo for semantic taxonomy-driven prompt optimization.
July 2, 2025 at 2:31 PM
yes! the tldr is compute sufficient statistics for the desired model for each distinct value of a candidate join key. The provider or an LLM identifies join keys offline. arxiv.org/abs/2305.10419
Kitana: Efficient Data Augmentation Search for AutoML
AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). ...
arxiv.org
May 27, 2025 at 11:40 PM
Making use of existing structured data is super valuable but still an open question. We've been working on closing the loop on table search in large data repos for a while now. The key is to search via the data task rather than keywords or a table

Overview slides: www.dropbox.com/scl/fi/unpng...
www.dropbox.com
May 27, 2025 at 1:39 PM