Andy Grove
banner
andygrove.io
Andy Grove
@andygrove.io
Apache Arrow & DataFusion PMC Member. Original creator of Apache DataFusion.
On behalf of the DataFusion PMC, I'm excited to announce the release of version 0.11.0 of the Comet accelerator for Apache Spark!

datafusion.apache.org/blog/2025/10...
Apache DataFusion Comet 0.11.0 Release - Apache DataFusion Blog
datafusion.apache.org
October 22, 2025 at 2:21 PM
It’s steak night tonight and our dog is patiently waiting for her share.
October 6, 2025 at 2:14 AM
I like the name “RAD stack” for this.
Today, in the Data Streaming Journey, I'm sharing my experience of building data streaming products with the RAD stack: Rust, Arrow, DataFusion.

www.streamingdata.tech/p/streaming-...
Streaming and the RAD Stack
RAD: Rust, Arrow, DataFusion
www.streamingdata.tech
September 24, 2025 at 1:16 PM
Check out the latest release of the Comet accelerator for Apache Spark

datafusion.apache.org/blog/2025/09...
Apache DataFusion Comet 0.10.0 Release - Apache DataFusion Blog
datafusion.apache.org
September 18, 2025 at 3:06 AM
Reposted by Andy Grove
Introducing Iron Vector: native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API built on top of Rust, Arrow and DataFusion.

Reduce your Flink compute cost by up to 2x or handle 2x more data with the same infrastructure.
September 15, 2025 at 4:04 PM
Reposted by Andy Grove
We received reports of a phishing campaign targeting crates​.io users. Do not click on links asking to authenticate to protect your account. More information: blog.rust-lang.org/2025/09/12/c...
crates.io phishing campaign | Rust Blog
Empowering everyone to build reliable and efficient software.
blog.rust-lang.org
September 12, 2025 at 2:22 PM
Reposted by Andy Grove
Thanks to @clflushopt.bsky.social, make massive TPCH datasets with tpchgen-cli 2.0:

SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop

Try it now:

pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet

github.com/clflushopt/t...
September 4, 2025 at 12:51 PM
Reposted by Andy Grove
I've been helping our analytics team integrate our DataFusion-based query engine for Postgres into EDB Postgres Distributed and finally here's an end-to-end demo.

You get HA Postgres plus seamless replication and DataFusion-based queries. This query turned out 6x faster than PG.
September 4, 2025 at 4:16 PM
How my day is going
August 22, 2025 at 7:46 PM
We now have a roadmap section in the Comet contributor guide, in case anyone was wondering what we are focusing on lately and what features will be arriving in future releases.

datafusion.apache.org/comet/contri...
Comet Roadmap — Apache DataFusion Comet documentation
datafusion.apache.org
August 20, 2025 at 9:12 PM
Reposted by Andy Grove
Cassandra Team at Apple is searching for a fresh grad / person early in their career to join our ranks in SF/Bay Area!

Come work on super interesting problems with world class team. Help us build better Cassandra!

Ping me if you’re interested!

jobs.apple.com/en-us/detail...
Software Engineer, ASE Cassandra Storage - Jobs - Careers at Apple
Apply for a Software Engineer, ASE Cassandra Storage job at Apple. Read about the role and find out if it’s right for you.
jobs.apple.com
July 18, 2025 at 9:02 PM
It took me a really long time to understand the flow of execution between JVM and native code during query execution in Comet. I wish I had thought about adding a tracing capability earlier.

github.com/apache/dataf...
perf: Add performance tracing capability by andygrove · Pull Request #1706 · apache/datafusion-comet
Which issue does this PR close? Closes #1705 Rationale for this change This feature makes it possible to visualize the flow of calls during query execution. What changes are included in this PR?...
github.com
May 2, 2025 at 3:08 PM
Reposted by Andy Grove
We're pleased to announce that Apache DataFusion in Python 46.0.0 is released! Since the last announcement post we've had a lot of great features and new contributors. Please check out the blog post with details.

datafusion.apache.org/blog/2025/03...

#DataFusion #Python #DataFrame #PyData #Apache
Apache DataFusion Python 46.0.0 Released - Apache DataFusion Blog
datafusion.apache.org
April 7, 2025 at 12:27 PM
We have a position open in the Spark team at Apple, in our Cupertino, CA office. The role would include working on Apache DataFusion Comet.

jobs.apple.com/en-us/detail...
Senior Software Development Engineer (Apache Spark) - Apple Data Platform - Jobs - Careers at Apple
Apply for a Senior Software Development Engineer (Apache Spark) - Apple Data Platform job at Apple. Read about the role and find out if it’s right for you.
jobs.apple.com
April 2, 2025 at 5:28 PM
Here's the blog post announcing Comet 0.7.0

datafusion.apache.org/blog/2025/03...
March 21, 2025 at 12:32 AM
DataFusion Comet 0.7.0 is now available in Maven. We'll be publishing a blog post next week with all the details.

The repo has been updated with the latest benchmark results. For single executor TPC-H @ 100 GB, we now see a 2.2x increase over Spark (up from 2x in 0.6.0).

github.com/apache/dataf...
GitHub - apache/datafusion-comet: Apache DataFusion Comet Spark Accelerator
Apache DataFusion Comet Spark Accelerator. Contribute to apache/datafusion-comet development by creating an account on GitHub.
github.com
March 19, 2025 at 5:11 PM
One month on, and I have zero regrets about quitting Facebook & Instagram.

I have replaced the scrolling time with listening to podcasts.

I now stay in touch with family overseas via email and photo sharing, and I use Snapchat for sharing photos with immediate family, privately. Works great.
I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?
February 18, 2025 at 5:55 PM
Reposted by Andy Grove
Chris Riccomini (@chris.blue) shares his thoughts on Open Source foundations: Apache, CNCF, Commonhaus. He also explains why Commonhaus is a better fit for SlateDB

cnr.sh/posts/compar...
Comparing Apache, CNCF, and Commonhaus | cnr.sh
I've used open source projects for over 30 years and contributed for about 20 of those. My first interaction with an open source foundation was with Apache when I began working with Apache Hadoop ...
cnr.sh
February 18, 2025 at 12:49 PM
Comet 0.6.0 has been released. This is a smaller release than usual now that we have moved to an approximately monthly release cadence to match core DataFusion.

datafusion.apache.org/blog/2025/02...
Apache DataFusion Comet 0.6.0 Release - Apache DataFusion Blog
datafusion.apache.org
February 18, 2025 at 5:29 PM
Ballista 43.0.0 has been released, and now provides seamless integration with DataFusion.

datafusion.apache.org/blog/2025/02...
Apache DataFusion Ballista 43.0.0 Released - Apache DataFusion Blog
datafusion.apache.org
February 12, 2025 at 5:49 PM
Check out this excellent presentation from @robtandy.bsky.social on his work with the DataFusion Ray project from last week's DataFusion community meetup.

It is a great overview of how to build a distributed system on top of DataFusion.

www.youtube.com/watch?v=ceTo...
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording
YouTube video by Datadog
www.youtube.com
January 29, 2025 at 2:48 PM
I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?
January 18, 2025 at 5:45 PM
This week in DataFusion Comet (Jan 18).

Inspired by @andrewlamb1111.bsky.social's weekly updates in DataFusion core, I am going to start doing the same in Comet to help keep the community updated on current events.

github.com/apache/dataf...
This week in Comet (Jan 18) · Issue #1305 · apache/datafusion-comet
Introduction Inspired by @alamb's weekly updates in DataFusion, I thought it would be a good idea to do something similar in Comet to keep contributors updated on what is happening in the project. ...
github.com
January 18, 2025 at 4:37 PM
DataFusion Comet 0.5.0 has been released. See blog post for details.

datafusion.apache.org/blog/2025/01...
Apache DataFusion Comet 0.5.0 Release - Apache DataFusion Blog
datafusion.apache.org
January 17, 2025 at 8:51 PM