#DataFusion
ICE and CBP agents used facial-recognition technology on Chicago streets to verify citizenship, marking DHS’s first confirmed use of biometric policing in urban areas. #Trump #DHS #ICE #CBP #FacialRecognition #Surveillance #CivilRights #DataFusion #InstitutionalAccountability #404Media
ICE and CBP Agents Are Scanning Peoples’ Faces on the Street To Verify Citizenship
Videos on social media show officers from ICE and CBP using facial recognition technology on people in the field. One expert described the practice as “pure dystopian creep.”
https://www.404media.co/ice-and-cbp-agents-are-scanning-peoples-faces-on-the-street-to-verify-citizenship/\
November 9, 2025 at 10:12 PM
We are holding the next Apache DataFusion meetup next Wednesday Nov 12 in Boston. lu.ma/w9pw5rce
Boston Apache DataFusion Meetup · Luma
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. This…
lu.ma
November 4, 2025 at 6:05 PM
Apache DataFusion 50 is released. Read all about it here: datafusion.apache.org/blog/2025/09...
September 29, 2025 at 1:47 PM
CloudFlare's Distributed R2 SQL engine's is a pretty good exemplar of how to build a serverless database to process petabytes in seconds using Apache DataFusion and Apache Parquet

blog.cloudflare.com/r2-sql-deep-...
R2 SQL: a deep dive into our new distributed query engine
R2 SQL provides a built-in, serverless way to run ad-hoc analytic queries against your R2 Data Catalog. This post dives deep under the Iceberg into how we built this distributed engine, from its metad...
blog.cloudflare.com
September 26, 2025 at 10:29 AM
It’s really cool to see what’s getting built using the components of the composable data stack like Apache Datafusion, Apache Arrow, and Apache Iceberg. Cloudflare’s R2SQL is just another example of what’s possible. Hey @columnar.tech where is the driver for this?
R2 SQL: a deep dive into our new distributed query engine
R2 SQL provides a built-in, serverless way to run ad-hoc analytic queries against your R2 Data Catalog. This post dives deep under the Iceberg into how we built this distributed engine, from its metad...
blog.cloudflare.com
September 25, 2025 at 8:31 PM
I cannot say enough about DataFusion...in order to build an engine that considers spatial types at every level we needed to customize types, functions, optimizer rules, joins, Parquet pruning, and more. DataFusion not only made this possible but documented even the most obscure bits. So cool!
September 25, 2025 at 1:35 AM
"Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen"

Built in Rust with Apache DataFusion

sedona.apache.org/latest/blog/...
Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen - Apache Sedona
Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of...
sedona.apache.org
September 24, 2025 at 9:20 PM
I'm beyond excited for this! Built on Apache DataFusion and Apache Arrow, a composable single-node analytical database with spatial types as first-class citizens. GeoPandas IO, KNN joins, CRS support, R/Python bindings, and top-notch lazy GeoParquet reads.

sedona.apache.org/latest/blog/...
Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen - Apache Sedona
Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of...
sedona.apache.org
September 24, 2025 at 7:03 PM
Today, in the Data Streaming Journey, I'm sharing my experience of building data streaming products with the RAD stack: Rust, Arrow, DataFusion.

www.streamingdata.tech/p/streaming-...
Streaming and the RAD Stack
RAD: Rust, Arrow, DataFusion
www.streamingdata.tech
September 22, 2025 at 3:58 PM
New Lonboard release and new demo! Integrating marimo and Apache DataFusion to visualize the NYC taxi dataset. developmentseed.org/lonboard/lat...
September 18, 2025 at 8:02 PM
Introducing Iron Vector: native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API built on top of Rust, Arrow and DataFusion.

Reduce your Flink compute cost by up to 2x or handle 2x more data with the same infrastructure.
September 15, 2025 at 4:04 PM
LanceDBはLance( github.com/lancedb/lance )とApache Arrow( github.com/apache/arrow )とApache DataFusion( github.com/apache/dataf... )のファサードなので,理解しようとするとこれらまとめて全部理解する必要がある,というところまで理解
September 15, 2025 at 12:23 PM
Awesome reading list about Apache DataFusion. I started diving into it lately, hacking an extension to have Elasticsearch as a data source (aka a TableProvider). It's a wonderful piece of software and an impressive ecosystem. datafusion.apache.org/user-guide/c...
November 14, 2024 at 7:55 AM
DataFusion is an amazing and seriously cool project.
January 8, 2025 at 10:41 PM
A database that works with event sourcing has been obsessing me for a bit. I’ve tried several event sourcing implementations in Rust but have yet to figure out a good starting point for a true db. Apache stuff can definitely help - DataFusion for SQL, Arrow, Parquet…
September 9, 2025 at 5:55 AM
apache / datafusion-comet: Apache DataFusion Comet Spark Accelerator ★390 https://github.com/apache/datafusion-comet
apache / datafusion-comet
Apache DataFusion Comet Spark Accelerator
github.com
April 27, 2024 at 10:38 PM
I've been building something similar:

- Iceberg as table format
- Datafusion as query engine
- Airbyte for ingestion

Check out my presentation here:
youtu.be/tksTFG2YoZM?...
Dashtool - A data build tool designed for using Iceberg Materialized Views for data | OSA COM
YouTube video by The Open Source Analytics Community (OSA COM)
youtu.be
December 9, 2024 at 12:39 PM
Discover why startups like Flarion, LakeSail, and major companies are betting on Apache DataFusion — the Rust-based query engine that's reshaping data analytics.
Why Startups Are Betting Everything on Apache DataFusion
Discover why startups like Flarion, LakeSail, and major companies are betting on Apache DataFusion — the Rust-based query engine that's reshaping data analytics.
bit.ly
July 22, 2025 at 7:31 PM
datafusion-postgres: postgres protocol adapter for datafusion query engine Comments

Interest | Match | Feed
Origin
github.com
October 2, 2025 at 6:05 PM
Datafusion v43 has seen a lot of performance work especially around reading parquet and the numbers are very nice! From the clickbench benchmark on the same hardware type:
November 15, 2024 at 4:17 PM
Correction: @glaredb.bsky.social is moving *away* from DataFusion! Their talk discusses the problems with building a DBMS using off-shelf parts. Like @duckdb.org, the new GlareDB rewrite borrows ideas from the Germans' HyPer system but it's written in Rust: www.youtube.com/watch?v=Sor3...
November 20, 2024 at 11:14 AM
I'm excited to announce that InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 License: www.influxdata.com/blog/influxd...

We're also releasing an alpha of InfluxDB 3 Enterprise at the same time.

This builds on years of effort with Apache Arrow, DataFusion, and Parquet.
InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 License
Announcing the alpha release of InfluxDB 3 Core and InfluxDB 3 Enterprise. InfluxDB 3 Core is a recent-data engine for time series and event data. InfluxDB 3 Enterprise adds historical query capabilit...
www.influxdata.com
January 13, 2025 at 3:09 PM
Boring Data Tool (bdt) has now moved to the datafusion-contrib GitHub org. I think this is a nice example of building CLI data tools with ApacheArrow and DataFusion

github.com/datafusion-c...
GitHub - datafusion-contrib/bdt: Boring Data Tool
Boring Data Tool. Contribute to datafusion-contrib/bdt development by creating an account on GitHub.
github.com
February 19, 2024 at 4:41 PM
The latest update for #InfluxDB includes "Optimizing SQL (and DataFrames) in DataFusion: Part 2" and "Simplifying Multi-Node Setups with InfluxDB 3 Enterprise Modes".

#monitoring #devops #timeseries https://opsmtrs.com/2W5CAx0
InfluxData
InfluxData provides the leading time series platform to instrument, observe, learn and automate any system, application and business process across a variety of use cases.
opsmtrs.com
April 4, 2025 at 4:04 AM
Transforming Logistics with AQUA DataFusion: The Future of Data Utilization Services Starting June 30!#Japan#AI_Technology#YE_DIGITAL#Kitakyushu#AQUA_DataFusion
Transforming Logistics with AQUA DataFusion: The Future of Data Utilization Services Starting June 30!
YE DIGITAL is set to launch AQUA DataFusion, a data utilization service aimed at logistics and manufacturing sectors on June 30. Discover its features!
third-news.com
June 20, 2025 at 1:38 AM