Burak
buremba.bsky.social
Burak
@buremba.bsky.social
Data Engineer - Cooking https://github.com/buremba/universql 🐥
Soon to be, it looks like: youtu.be/zeonmOO9jm4?...
Otherwise, there is no point of using Parquet instead of their DuckDB native format. I’m glad they didn’t ignore the “industry standards”
Introducing DuckLake
YouTube video by DuckDB
youtu.be
May 27, 2025 at 3:31 PM
Is there any plan to support data compaction to data lake when data inlining is used?
May 27, 2025 at 3:30 PM
I was worried about Iceberg being ignored in favor of DuckLake but looks like you fixed Iceberg’s biggest problems and still kept the compatibility. Super exciting!
May 27, 2025 at 3:20 PM
Turns out the implementation wasn’t WAL but they had a new Iceberg compatible data lake extension. I like the direction they are going!
May 27, 2025 at 3:15 PM
I have this one but they might have soon to be public extension to use the WAL to keep the data in sync with data lake: github.com/duckdb/duckd...
Implement WALReader by adsharma · Pull Request #17247 · duckdb/duckdb
This could be useful to external replication tools to read WAL records similar to how wal2json (Postgres) and binlog (MySQL) work. Translation to externally consumable format is not included.
github.com
May 21, 2025 at 3:42 PM
That’s a good analogy, might steal it. :) However; when the destination path is not clear (which is usually case as you need to experiment and iterate anyways) smashing can help accelerate finding the destination as you learn where not to go.
May 5, 2025 at 4:14 PM
Ironically the number of stale documents in our company is increased dramatically thanks to LLM.
April 26, 2025 at 4:21 PM
Oh I lost count of how much time I waste trying to infer the column names from random CSV files without a header. This is very handy!
March 17, 2025 at 6:49 PM
Exactly! I think Flight will get more popular over time as it's the most efficient implementation, but this approach can help existing RESTFul apps to adopt SQL integrations before switching over to GRPC.
February 15, 2025 at 6:54 PM
Pretty common but if one of these languages is the “main” one, it might be more desirable to generate JSONSchema from Pydantic/TS and generate the models for other language from JSONSchema. It’s more about where you want the source of truth should be.
February 10, 2025 at 8:52 PM
I had the exact same thought..
February 6, 2025 at 6:46 PM
Thanks. I'm also a fan of your creative extensions! Quackpipe was one of the inspirations. :)
January 29, 2025 at 2:04 AM
One here! 🍻
January 28, 2025 at 6:17 PM
I couldn't figure out how to insert a table into an S3 Table without Spark. I tried to use the API but it requires me to create the files and update the metadata. PyIceberg can't write to S3 Tables through its S3 integration yet so I had to stick to Spark. boto3.amazonaws.com/v1/documenta...
update_table_metadata_location - Boto3 1.35.99 documentationContentsMenuExpandLight modeDark modeAuto light/dark modeClose Menu
boto3.amazonaws.com
January 14, 2025 at 10:41 PM
If AWS is serious about S3 Tables, they should support Iceberg REST Catalog in it. Right now we can only create tables with Spark.
January 14, 2025 at 8:27 PM
Qlik's Upsolver acquisition shows the importance of adopting new technologies as a potential acquisition target for bigger companies. It's a 10-year-old company, and they raised a ton, so I'm not sure how good the deal was for the co-founders.
January 14, 2025 at 5:43 PM