Lightnews — Scholar-powered news

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

Postgres is increasingly becoming a versatile data platform, instead of just an operational database.

Using pg_parquet you can trivially export data to S3, and using Crunchy Data Warehouse you can just as easily query or import Parquet files from PostgreSQL.

February 7, 2025 at 11:11 AM

Reposted by Paul Laurence

Paul Ramsey

@pwramsey.bsky.social

Access a Canada-wide elevation model (Mr DEM!) directly inside PostgreSQL using PostGIS raster support for Cloud Optimized GeoTIFF www.crunchydata.com/blog/using-c...

Using Cloud Rasters with PostGIS | Crunchy Data Blog

Paul shows you how to access raster data stored in the cloud or object storage for PostGIS using cloud optimized GeoTIFF (aka COG) files. He also includes some functions for working with raster elevat...

www.crunchydata.com

February 7, 2025 at 5:30 PM

Reposted by Paul Laurence

🎱 Josh Branchaud ✨

@jbranchaud.bsky.social

I wrote up a summary of resources for sharpening your PostgreSQL, SQL, and relational database skills.

The common denominator for all of these is that you have to be *doing* (modeling, writing SQL, and wrangling data) to get better.

still.visualmode.dev/blogmarks/6

A screenshot of a webpage showing PostgreSQL learning resources compiled from Reddit, including links to six educational platforms: CrunchyData tutorials, DB Fiddle, PG4E, PostgreSQL Exercises, DataLemur questions, and LeetCode's Top SQL-50.

January 24, 2025 at 10:27 PM

Reposted by Paul Laurence

Crunchy Data

@crunchydata.com

Congratulations to the Postgres community on PostgreSQL once again being named the DBMS of the year in 2024, for the second year in a row. db-engines.com/en/blog_post...

PostgreSQL is the Database Management System of the Year 2024

db-engines.com

January 14, 2025 at 1:54 PM

Reposted by Paul Laurence

Jay Tennier

@jaytennier.bsky.social

Today we ran into the dreaded Postgres primary key integer overflow error so if anyone knows Jesse Soyland from Crunchy Data can you please thank him for writing this article that got us back up and running almost immediately: www.crunchydata.com/blog/the-int...

The Integer at the End of the Universe: Integer Overflow in Postgres | Crunchy Data Blog

Integer overflow can happen if you have a sequencing data type exceeding integer limits. Jesse has a query to help you spot it and recommendations for a short term and long term fix.

www.crunchydata.com

January 7, 2025 at 6:50 PM

Reposted by Paul Laurence

Danica Fine

@thedanicafine.bsky.social

The submissions are already rolling in on the #icebergSummit Call for Papers, but... that's weird... it looks like we're missing YOURS! 🫵

Iceberg Summit 2025: Call for Speakers

Iceberg Summit 2025 is the second edition of the Iceberg Summit, an event sanctioned by the Apache Software Foundation (ASF) with oversight from the A...

sessionize.com

January 9, 2025 at 5:53 PM

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

A lot of great recommendations on tuning PostgreSQL for analytical queries by @karenhjex.bsky.social

www.crunchydata.com/blog/postgre...

Postgres Tuning & Performance for Analytics Data | Crunchy Data Blog

Karen digs into Postgres strategies for working with large analytical data sets. She reviews tuning, strategies for pre-compiling data, and other analytics systems.

www.crunchydata.com

January 9, 2025 at 7:37 PM

Reposted by Paul Laurence

Rachel Stephens

@rstephens.me

In the piece I cover:

- the partnership between @cncf.io and Andela
- the release of Akamai App Platform cc/ @mikemaney.bsky.social
- the rebranding of Akka
- the partnership between @gitlab.com and AWS
- the release of Crunchy Data Warehouse cc/ @craigkerstiens.com @crunchydata.com

December 19, 2024 at 6:52 PM

Reposted by Paul Laurence

Nikhil Benesch

@benesch.bsky.social

Well well well: www.crunchydata.com/blog/pg_incr...

Incremental pipelines come to Postgres via Crunchy Data! This is like "dbt incremental", not true incremental view maintenance like @materialize.com or Snowflake's dynamic tables, but it's a neat step towards IVM.

December 19, 2024 at 5:08 AM

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

End-to-end demo of the new pg_incremental extension.

There's raw events table and a summary table containing view counts.

You then define a pipeline using an insert..select command, and keeps running that to do fast, reliable, incremental processing in the background.

December 17, 2024 at 5:35 PM

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

There are many incremental processing solutions, but they seem to never quite do what I need.

I decided to build an extension that just keeps running the same command in Postgres with different parameters to do fast, reliable incremental data processing.

That's pg_incremental.

1/n

Crunchy Data @crunchydata.com · Dec 17

We’re excited to release pg_incremental today - a new extension for automated incremental updates. pg_incremental is like a supercharged pg_cron that runs data pipelines, data syncs, rollups, imports and exports.

www.crunchydata.com/blog/pg_incr...

pg_incremental: Incremental Data Processing in Postgres | Crunchy Data Blog

We are excited to release a new open source extension called pg_incremental. pg_incremental works with pg_cron to do incremental batch processing for data aggregations, data transformations, or import...

www.crunchydata.com

December 17, 2024 at 5:10 PM

Reposted by Paul Laurence

Crunchy Data

@crunchydata.com

We’re excited to release pg_incremental today - a new extension for automated incremental updates. pg_incremental is like a supercharged pg_cron that runs data pipelines, data syncs, rollups, imports and exports.

www.crunchydata.com/blog/pg_incr...

pg_incremental: Incremental Data Processing in Postgres | Crunchy Data Blog

We are excited to release a new open source extension called pg_incremental. pg_incremental works with pg_cron to do incremental batch processing for data aggregations, data transformations, or import...

www.crunchydata.com

December 17, 2024 at 4:39 PM

Reposted by Paul Laurence

Steven

@stevevance.net

if you use PostgreSQL a lot like I do, @crunchydata.com is indispensable.

TIL about the ROLLUP keyword that can go after GROUP BY which will create subtotals of numeric data!

screenshot of query results showing how ROLLUP creates a subtotal

December 12, 2024 at 9:58 PM

Reposted by Paul Laurence

Adam Bellemare

@abellemare.bsky.social

And the winner is: Apache Iceberg!

Please commence the flame wars in the comments below.

www.bigdatawire.com/2024/12/03/h...

How Apache Iceberg Won the Open Table Wars

Apache Iceberg has recently emerged as the de facto open-table standard for large-scale datasets, with a thriving community and support from many of the

www.bigdatawire.com

December 4, 2024 at 6:34 PM

Reposted by Paul Laurence

Crunchy Data

@crunchydata.com

Many big Postgres databases today use partitioning. Do you have a default partition? If not, you probably should. Default partitions are super important because they let you catch inconsistent data or bugs in your application code.

www.crunchydata.com/blog/postgre...

Postgres Partitioning with a Default Partition | Crunchy Data Blog

Keith discusses the importance of having a default partition, how to monitor the default, and how to move rows to new child tables.

www.crunchydata.com

December 6, 2024 at 4:47 PM

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

Everything gets simpler once you have transactions on Iceberg tables.

A useful pattern is to keep track of loaded files in a Postgres table the same transaction that loads the file into Iceberg, such that each file is loaded exactly once

www.crunchydata.com/blog/iceberg...

Iceberg ahead! Analyzing Shipping Data in Postgres | Crunchy Data Blog

Marco shows off how work with Iceberg and Postgres together in Crunchy Data Warehouse. He creates an Iceberg data set from AIS shipping data, batch loads public data daily, and then runs reports and m...

www.crunchydata.com

December 5, 2024 at 3:57 PM

Reposted by Paul Laurence

Hacker News Bot

@hnfeed.bsky.social

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
https://www.crunchydata.com/blog/crunchy-data-warehouse-postgres-with-iceberg-for-high-performance-analytics
[comments] [10 points]

December 4, 2024 at 11:03 PM

Reposted by Paul Laurence

Hacker News

@mm-hacker-news.bsky.social

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
https://www.crunchydata.com/blog/crunchy-data-warehouse-postgres-with-iceberg-for-high-performance-analytics

December 4, 2024 at 11:35 PM

Reposted by Paul Laurence

Richard Bishop

@richardb.bsky.social

December 4, 2024 at 11:22 PM

Reposted by Paul Laurence

Marco Slot

@marcoslot.com

Exciting announcement. Played with this a bit today.

It's like an Iceberg starter kit that bundles storage with the catalog and compaction. Systems looking to support Iceberg writes will have a relatively easy time targeting S3 tables catalog, because they don't need to add compaction.

1/n

Corey Quinn @quinnypig.com · Dec 3

The launch of S3 Tables, a new bucket type that stores Apache Iceberg tables.

December 4, 2024 at 10:12 PM

Reposted by Paul Laurence

Crunchy Data

@crunchydata.com

We have been talking to customers a lot about the medallion architecture with data. This is a pattern we see for data lakehouses.

December 3, 2024 at 7:39 PM

Reposted by Paul Laurence

Craig

@craigkerstiens.com

When Amazon began shipping Postgres, demand was already pent up, but the availability from AWS was a big accelerator to adoption.

Will be interesting to see if the same occurs with awareness around Iceberg. Having been pitching for a year now, people are still vastly unaware of Iceberg.

December 3, 2024 at 5:30 PM

Reposted by Paul Laurence

Fabien Nicollet

@fnicollet.bsky.social

Love @crunchydata.com 's kids activity book 🥰

Craig @craigkerstiens.com · Dec 3

Managed to dig up a few of the pages as well

December 3, 2024 at 4:38 PM

Reposted by Paul Laurence

Danica Fine

@thedanicafine.bsky.social

Let's talk about #apacheIceberg. It's been making waves in the #dataEngineering space, so you've likely heard of it. But what is it?

Iceberg is a high-performance, open table format designed for managing large-scale data workloads in a #dataLake. Now, why does that matter? 🧵