Andrew Lamb
andrewlamb1111.bsky.social
Andrew Lamb
@andrewlamb1111.bsky.social
Apache {DataFusion PMC}, Database Internals
Latest Apache DataFusion blog: more efficient plans and how to efficiently contribute: datafusion.apache.org/blog/output/...
December 20, 2025 at 12:37 PM
Qiwei Huang explains how we use Late Materialization (LM) in the Apache Rust Parquet reader to accelerate filtering. LM can describe several techniques, but this is a core one (also applies to joins, Top-K, etc)

arrow.apache.org/blog/2025/12...
December 12, 2025 at 11:40 AM
Thanks to funnel.io, we are hosting a DataFusion meetup in Stockholm
Date: Thursday March 5, 2026: 17:30 - 20:00
Signup: luma.com/ctqtiqap
Funnel | The leading marketing intelligence platform
Use Funnel to aggregate data from all your marketing platforms. Access powerful reporting and data modeling, and seamlessly export to any destination.
funnel.io
December 10, 2025 at 2:06 PM
There is some crazy (good) activity on the Apache Parquet mailing list for new encodings. A sample: PFOR, FSST, ALP, Strings and Cascaded Encodings. 🤯 Huge kudos to Arnav Balyan, Prateek Gaur, and Micah Kornfield for driving this.

lists.apache.org/list.html?de...
December 8, 2025 at 2:37 PM
Here are the slides and recordings from our Boston DataFusion Meetup in September:

Youtube: youtu.be/wCAud478Dg8
Slides (pdf): drive.google.com/file/d/18KGH...
Apache DataFusion Boston Meetup: September 12, 2025
YouTube video by Andrew Lamb
youtu.be
December 4, 2025 at 11:51 AM
DataFusion 51.0.0 release blog: datafusion.apache.org/blog/2025/11...
November 25, 2025 at 8:50 PM
Reposted by Andrew Lamb
Why rebuild the wheel? @olimpiupop.bsky.social talks with @andrewlamb1111.bsky.social about how Apache Arrow, Parquet, and the FDAP stack are letting database teams focus on innovation instead of reinventing the basics. youtu.be/Gd-mhbiy8Vo?...
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025
YouTube video by GOTO Conferences
youtu.be
November 24, 2025 at 2:05 PM
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025

www.youtube.com/watch?v=Gd-m...
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025
YouTube video by GOTO Conferences
www.youtube.com
November 24, 2025 at 2:21 PM
Does anyone know a good academic / industrial overview of how to implement (not use) LATERAL joins in SQL? It keeps coming up in DataFusion and I need to get reasonable background on it. github.com/apache/dataf...
November 23, 2025 at 1:05 PM
Save the date -- Wednesday July 22, 2026 for the first Apache DataFusion meetup in Denver: luma.com/jsu6faie
Denver Apache DataFusion Meetup · Luma
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. We will…
luma.com
November 23, 2025 at 11:14 AM
One fun nugget from the Boston
@apachedatafusion.bsky.social meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion
November 14, 2025 at 6:18 PM
Here is a nice examination of the benefits of building new systems using the extensibility of @apachedatafusion.bsky.social vs other systems. www.bauplanlabs.com/post/duck-hu...
Duck Hunt: moving Bauplan from DuckDB to DataFusion
Bauplan's journey from DuckDB to Apache DataFusion: how switching SQL engines doubled query performance on Iceberg lakehouses while enabling greater hackability
www.bauplanlabs.com
November 11, 2025 at 3:41 PM
Reposted by Andrew Lamb
Excited to be one of the attendees and present our work on the DataFusion-powered SedonaDB alongside a great lineup of talks! If you're in the Boston area come and say hi!
November 4, 2025 at 6:33 PM
"if you want to go fast, go alone; If you want to go far, go together"
New Apache Parquet Community page is up: parquet.apache.org/community/
November 7, 2025 at 8:06 PM
We are holding the next Apache DataFusion meetup next Wednesday Nov 12 in Boston. lu.ma/w9pw5rce
Boston Apache DataFusion Meetup · Luma
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. This…
lu.ma
November 4, 2025 at 6:05 PM
If anyone wants to know why Xiangpeng Hao is a great mentor, they can read this response: github.com/XiangpengHao...
November 3, 2025 at 8:16 PM
New version of Rust Apache Arrow and Apache Parquet is out -- includes new new metadata parser, new avro reader, geometry and variant support 🤯 arrow.apache.org/blog/2025/10...
Apache Arrow Rust 57.0.0 Release
The Apache Arrow team is pleased to announce that the v57.0.0 release of Apache Arrow Rust is now available on crates.io (arrow and parquet) and as source download. See the 57.0.0 changelog for a full...
arrow.apache.org
October 31, 2025 at 10:26 AM
I have heard from 3 people/projects in the last three days they are considering forks of iceberg-rust. I filed a ticket to see if we can figure out how to consolidate efforts: github.com/apache/icebe...
October 28, 2025 at 5:50 PM
Apache DataFusion's policy for AI assisted contribution:

AI is great, but not AI dumps: maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge when acting as a pass through AI proxy.

datafusion.apache.org/contributor-...
Introduction — Apache DataFusion documentation
datafusion.apache.org
October 27, 2025 at 12:51 PM
Some Apache Parquet nerd humor for Friday afternoon

lists.apache.org/thread/36rdg...
October 24, 2025 at 8:24 PM
We made Apache Parquet metadata parsing 3x-9x faster in the latest release of the Rust implementation
arrow.apache.org/blog/2025/10...
October 24, 2025 at 9:55 AM
Reposted by Andrew Lamb
Today's Future Data Systems Seminar Speaker: Ian Cook (@ian.columnar.tech) will present @columnar.tech's work on Apache Arrow's database connectivity API (ADBC). ADBC is available in modern DBMSs. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
[Future Data] Where We're Going, We Don't Need Rows: Columnar Data Connectivity with ADBC - Carnegie Mellon Database Group
ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC:... Read More +
db.cs.cmu.edu
October 20, 2025 at 11:38 AM
More Products built with Apache DataFusion: Palantir Foundry's Pipeline Builder

www.palantir.com/docs/foundry...
October 21, 2025 at 7:52 PM