Paul Laurence
prlaurence.com
Paul Laurence
@prlaurence.com
Co-founder @crunchydata.com
Reposted by Paul Laurence
Postgres is increasingly becoming a versatile data platform, instead of just an operational database.

Using pg_parquet you can trivially export data to S3, and using Crunchy Data Warehouse you can just as easily query or import Parquet files from PostgreSQL.
February 7, 2025 at 11:11 AM
Reposted by Paul Laurence
Access a Canada-wide elevation model (Mr DEM!) directly inside PostgreSQL using PostGIS raster support for Cloud Optimized GeoTIFF www.crunchydata.com/blog/using-c...
Using Cloud Rasters with PostGIS | Crunchy Data Blog
Paul shows you how to access raster data stored in the cloud or object storage for PostGIS using cloud optimized GeoTIFF (aka COG) files. He also includes some functions for working with raster elevat...
www.crunchydata.com
February 7, 2025 at 5:30 PM
Reposted by Paul Laurence
I wrote up a summary of resources for sharpening your PostgreSQL, SQL, and relational database skills.

The common denominator for all of these is that you have to be *doing* (modeling, writing SQL, and wrangling data) to get better.

still.visualmode.dev/blogmarks/6
January 24, 2025 at 10:27 PM
Reposted by Paul Laurence
Congratulations to the Postgres community on PostgreSQL once again being named the DBMS of the year in 2024, for the second year in a row. db-engines.com/en/blog_post...
PostgreSQL is the Database Management System of the Year 2024
db-engines.com
January 14, 2025 at 1:54 PM
Reposted by Paul Laurence
Today we ran into the dreaded Postgres primary key integer overflow error so if anyone knows Jesse Soyland from Crunchy Data can you please thank him for writing this article that got us back up and running almost immediately: www.crunchydata.com/blog/the-int...
The Integer at the End of the Universe: Integer Overflow in Postgres | Crunchy Data Blog
Integer overflow can happen if you have a sequencing data type exceeding integer limits. Jesse has a query to help you spot it and recommendations for a short term and long term fix.
www.crunchydata.com
January 7, 2025 at 6:50 PM
Reposted by Paul Laurence
The submissions are already rolling in on the #icebergSummit Call for Papers, but... that's weird... it looks like we're missing YOURS! 🫵
Iceberg Summit 2025: Call for Speakers
Iceberg Summit 2025 is the second edition of the Iceberg Summit, an event sanctioned by the Apache Software Foundation (ASF) with oversight from the A...
sessionize.com
January 9, 2025 at 5:53 PM
Reposted by Paul Laurence
A lot of great recommendations on tuning PostgreSQL for analytical queries by @karenhjex.bsky.social

www.crunchydata.com/blog/postgre...
Postgres Tuning & Performance for Analytics Data | Crunchy Data Blog
Karen digs into Postgres strategies for working with large analytical data sets. She reviews tuning, strategies for pre-compiling data, and other analytics systems.
www.crunchydata.com
January 9, 2025 at 7:37 PM
Reposted by Paul Laurence
In the piece I cover:

- the partnership between @cncf.io and Andela
- the release of Akamai App Platform cc/ @mikemaney.bsky.social
- the rebranding of Akka
- the partnership between @gitlab.com and AWS
- the release of Crunchy Data Warehouse cc/ @craigkerstiens.com @crunchydata.com
December 19, 2024 at 6:52 PM
Reposted by Paul Laurence
Well well well: www.crunchydata.com/blog/pg_incr...

Incremental pipelines come to Postgres via Crunchy Data! This is like "dbt incremental", not true incremental view maintenance like @materialize.com or Snowflake's dynamic tables, but it's a neat step towards IVM.
December 19, 2024 at 5:08 AM
Reposted by Paul Laurence
End-to-end demo of the new pg_incremental extension.

There's raw events table and a summary table containing view counts.

You then define a pipeline using an insert..select command, and keeps running that to do fast, reliable, incremental processing in the background.
December 17, 2024 at 5:35 PM
Reposted by Paul Laurence
There are many incremental processing solutions, but they seem to never quite do what I need.

I decided to build an extension that just keeps running the same command in Postgres with different parameters to do fast, reliable incremental data processing.

That's pg_incremental.

1/n
December 17, 2024 at 5:10 PM
Reposted by Paul Laurence
We’re excited to release pg_incremental today - a new extension for automated incremental updates. pg_incremental is like a supercharged pg_cron that runs data pipelines, data syncs, rollups, imports and exports.

www.crunchydata.com/blog/pg_incr...
pg_incremental: Incremental Data Processing in Postgres | Crunchy Data Blog
We are excited to release a new open source extension called pg_incremental. pg_incremental works with pg_cron to do incremental batch processing for data aggregations, data transformations, or import...
www.crunchydata.com
December 17, 2024 at 4:39 PM
Reposted by Paul Laurence
if you use PostgreSQL a lot like I do, @crunchydata.com is indispensable.

TIL about the ROLLUP keyword that can go after GROUP BY which will create subtotals of numeric data!
December 12, 2024 at 9:58 PM
Reposted by Paul Laurence
And the winner is: Apache Iceberg!

Please commence the flame wars in the comments below.

www.bigdatawire.com/2024/12/03/h...
How Apache Iceberg Won the Open Table Wars
Apache Iceberg has recently emerged as the de facto open-table standard for large-scale datasets, with a thriving community and support from many of the
www.bigdatawire.com
December 4, 2024 at 6:34 PM
Reposted by Paul Laurence
Many big Postgres databases today use partitioning. Do you have a default partition? If not, you probably should. Default partitions are super important because they let you catch inconsistent data or bugs in your application code.

www.crunchydata.com/blog/postgre...
Postgres Partitioning with a Default Partition | Crunchy Data Blog
Keith discusses the importance of having a default partition, how to monitor the default, and how to move rows to new child tables.
www.crunchydata.com
December 6, 2024 at 4:47 PM
Reposted by Paul Laurence
Everything gets simpler once you have transactions on Iceberg tables.

A useful pattern is to keep track of loaded files in a Postgres table the same transaction that loads the file into Iceberg, such that each file is loaded exactly once

www.crunchydata.com/blog/iceberg...
Iceberg ahead! Analyzing Shipping Data in Postgres | Crunchy Data Blog
Marco shows off how work with Iceberg and Postgres together in Crunchy Data Warehouse. He creates an Iceberg data set from AIS shipping data, batch loads public data daily, and then runs reports and m...
www.crunchydata.com
December 5, 2024 at 3:57 PM
Reposted by Paul Laurence
Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
https://www.crunchydata.com/blog/crunchy-data-warehouse-postgres-with-iceberg-for-high-performance-analytics
[comments] [10 points]
December 4, 2024 at 11:03 PM
Reposted by Paul Laurence
Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
https://www.crunchydata.com/blog/crunchy-data-warehouse-postgres-with-iceberg-for-high-performance-analytics
December 4, 2024 at 11:35 PM
Reposted by Paul Laurence
December 4, 2024 at 11:22 PM
Reposted by Paul Laurence
Exciting announcement. Played with this a bit today.

It's like an Iceberg starter kit that bundles storage with the catalog and compaction. Systems looking to support Iceberg writes will have a relatively easy time targeting S3 tables catalog, because they don't need to add compaction.

1/n
The launch of S3 Tables, a new bucket type that stores Apache Iceberg tables.
December 4, 2024 at 10:12 PM
Reposted by Paul Laurence
We have been talking to customers a lot about the medallion architecture with data. This is a pattern we see for data lakehouses.
December 3, 2024 at 7:39 PM
Reposted by Paul Laurence
When Amazon began shipping Postgres, demand was already pent up, but the availability from AWS was a big accelerator to adoption.

Will be interesting to see if the same occurs with awareness around Iceberg. Having been pitching for a year now, people are still vastly unaware of Iceberg.
December 3, 2024 at 5:30 PM
Reposted by Paul Laurence
Love @crunchydata.com 's kids activity book 🥰
Managed to dig up a few of the pages as well
December 3, 2024 at 4:38 PM
Reposted by Paul Laurence
Let's talk about #apacheIceberg. It's been making waves in the #dataEngineering space, so you've likely heard of it. But what is it?

Iceberg is a high-performance, open table format designed for managing large-scale data workloads in a #dataLake. Now, why does that matter? 🧵
December 2, 2024 at 3:28 PM