Anders
dataders.bsky.social
Anders
@dataders.bsky.social
DX @ dbt
new rule: talk about Iceberg without mentioning Hive or ACID properties.

the vast majority of SQL users don't care (and shouldn't)! it's like explaining Dropbox starting w/ bringing up libfuse

If you're new to iceberg lmk if you get something from this!
roundup.getdbt.com/p/iceberg-gi...
#databs
Iceberg?? Give it a REST!
The new abstraction that changes nothing... and everything
roundup.getdbt.com
April 1, 2025 at 6:57 PM
lots of juicy looking papers in the agenda for EDBT/ICDT 2025 happening this year in Barcelona! definitely going to be diving in more later today
edbticdt2025.upc.edu?contents=det...
#databs #edbc2025 #icdt2025
EDBT/ICDT 2025 Joint Conference - 25th March - 28th March, 2025 - Barcelona, Spain
edbticdt2025.upc.edu
March 24, 2025 at 4:33 PM
this is the clearest case for Arrow that I've ever seen. I love that it's high-level but also doesn't shy away from details when it's important.

this should be a #databs canon text imho. thanks @ianmcook.bsky.social !
2025 is shaping up to be a breakout year for fast query result transfer with Apache Arrow. But what exactly makes it so fast? David Li, Matt Topol, and I break it down in this new blog post: arrow.apache.org/blog/2025/01...
How the Apache Arrow Format Accelerates Query Result Transfer
Arrow speeds up query result transfer by slashing (de)serialization overheads. We outline five key attributes of the Arrow format that enable this.
arrow.apache.org
March 17, 2025 at 2:44 PM
TIL about relationalplayground.com

makes relational algebra accessible to a SQL monkey like me. much better than in a dry textbook where it's normally found.

#databs
Relational Playground
An exploration of relational algebra. Compare SQL queries with relational algebra expressions along with intermediate results.
relationalplayground.com
March 13, 2025 at 3:28 PM
imho, the most clear case Databricks has made in public on their Iceberg future post-Tabular acquisition.

I agree with this vision of the future and it's nice to see DBRX sharing how they see themselves participating in it.

worth clicking through the deck! #databs
speakerdeck.com/databricksja...
Iceberg Meetup Japan #1 : Iceberg and Databricks
2月21日に開催されたIceberg Meetup #1で使用した資料になります。 DatabricksとIcebergを使用する際のカタログについてご紹介しています。
speakerdeck.com
March 3, 2025 at 3:42 PM
1) my top-level takeaways on DeepSeek's smallpond: a distributed data processing framework used for training LLMs #databs
March 3, 2025 at 2:57 PM
Reposted by Anders
Mark Zuckerberg messages Facebook engineer

April 5, 2012
February 8, 2025 at 8:24 PM
Reposted by Anders
Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.

#databs
February 1, 2025 at 4:00 PM
Reposted by Anders
Building Query Compilers by Guido Moerkotte (695 pages)

Note: This is repost of @emresevinc.bsky.social's on X

Link: pi3.informatik.uni-mannheim.de/~moer/queryc...
January 26, 2025 at 6:27 PM
post two! The key technologies behind SQL Comprehension

#databs, do not fear compiler concepts -- embrace them and the new world order they enable for us! read the great blog (& pretty diagrams!)

@daveconnors3.bsky.social, truly a masterpiece
docs.getdbt.com/blog/sql-com...
January 24, 2025 at 6:22 PM
post one of a new series kicking off today whose larger thrust is effectively: understanding SQL, not just a job for the database! Post 1 lays out what the levels of understanding
docs.getdbt.com/blog/the-lev...
#databs
January 23, 2025 at 7:39 PM
Look forward to these every year. Always love Andy's candor and insight on our little corner of the world. #databs
Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...
Databases in 2024: A Year in Review
Andy rises from the ashes of his dead startup and discusses what happened in 2024 in the database game.
www.cs.cmu.edu
January 4, 2025 at 1:42 AM
#ghostty dropping today is the GitHub equivalent of a Beyoncé album. My whole feed is everyone following it. Now I gotta at least try it right?
github.com/ghostty-org/...
December 27, 2024 at 1:35 AM
my experience installing MSFT's ODBC driver on M-series macbook re-stokes my long-burning ire for Simba drivers and the company behind them. One might think MSFT is to blame for a less than medoicre database driver, but in a way they're victims here too, held captive by Simba. #databs
December 19, 2024 at 3:04 AM
does anyone use the 1Password CLI? I feel it has potential to be very valuable for managing environment variables (esp for teams). but i'm hesitant to adopt bc it requires that every command you run consume the output of `op`
developer.1password.com/docs/cli/get...
Get started with 1Password CLI | 1Password Developer
Learn how to install and sign in to 1Password CLI, then get started with commands and scripts to manage users, vaults, and items on the command line.
developer.1password.com
December 18, 2024 at 3:08 PM
admittedly, I'm triggered by SFTP, but it's a good analogy! What if there was an SFTP where you didn't have to parse random-delimited text files, but could just query the tables inside?
@chris.blue writes a post-length version of “Iceberg on S3 is the new SFTP.”

If you want to understand why SFTP is crucial to orgs, Chris has some great examples of how and why SFTP is used. All of his examples matched my experience in healthcare.

#dataBS

open.substack.com/pub/material...
S3 Is the New SFTP
Customers want their data. A customer data lake is a great way to give it to them.
open.substack.com
December 17, 2024 at 2:53 PM
art history major = business stakeholder? 😜 #databs
December 9, 2024 at 6:14 PM
Reposted by Anders
Great breakdown of the new S3 Tables feature that leverages Apache Iceberg. Including an explanation of the costs... which are complicated. #dataBS
bigdata.2minutestreaming.com/p/meet-your-...
meet your new data lakehouse: S3 Iceberg Tables
S3 Tables and S3 Metadata are two brutal new features that compete with common Apache Iceberg Lakehouse architectures
bigdata.2minutestreaming.com
December 6, 2024 at 8:28 AM
in AWS Glue Data Catalog, you can now read and write Iceberg tables using the Iceberg REST API.
aws.amazon.com/blogs/big-da...

previously platforms had to write a custom integration to read/write Glue Iceberg tables, not you can use the REST API!
#databs #apacheiceberg #aws
Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark | Amazon Web Services
In this post, we will explore how to harness the power of Open source Apache Spark and configure a third-party engine to work with AWS Glue Iceberg REST Catalog. The post will include details on how t...
aws.amazon.com
December 5, 2024 at 5:32 PM
Reposted by Anders
S3 (Iceberg) Tables is everything I dreamt of, and more. I blogged some long-form thoughts: meltware.com/2024/12/04/s...

I think we're about to see an explosion of data tools (@materialize.com, @clickhouse.com, @duckdb.org, et al.) learn to write Iceberg tables via S3 table buckets.

#databs
A First Look at S3 (Iceberg) Tables
AWS announced S3 Tables today, which brings native support for Apache Iceberg to S3. It’s hard to overstate how exciting this is for the data analytics ecosystem. This post is a quick rundown of my th...
meltware.com
December 4, 2024 at 10:02 AM
S3 tables?!? #databs #apacheiceberg
The launch of S3 Tables, a new bucket type that stores Apache Iceberg tables.
December 3, 2024 at 5:51 PM
Reposted by Anders
Need a holiday gift for the nerd in your life? How about a copy of my new book, Platform Engineering: A Guide for Technical, Product, and People Leaders!
amzn.to/4eUz5zB
Platform Engineering: A Guide for Technical, Product, and People Leaders
Amazon.com: Platform Engineering: A Guide for Technical, Product, and People Leaders eBook : Fournier, Camille, Nowland, Ian: Kindle Store
amzn.to
November 26, 2024 at 2:12 PM
anyone have a good term to collectively refer to the data "languages" like #SQL, #polars, #pandas, #dplyr, #malloy, #preql #numpy etc?

"data transformation APIs" (DTAPIs) is the best I've got so far but I'm dreaming of a better word #databs
November 26, 2024 at 3:16 PM
Reposted by Anders
Hey #dataBS and #datasky folks,

Our new post about "how understanding Big O Notation & Execution Plans can optimize SQL queries" has just been posted.

Check it out if you're interested, and we'd love to hear your thoughts! @hopefanhe.bsky.social
open.substack.com/pub/pipeline...
SQL Behind the Curtain: How Are Queries Executed?
Explore the journey of your SQL query guided by execution plans
open.substack.com
November 19, 2024 at 10:45 AM