Next Tuesday @marcoslot.com will be at Postgres Meetup for * to talk about pg_lake - #Postgres for #Iceberg with #DuckDB.
Join us!
www.meetup.com/postgres-mee...
Next Tuesday @marcoslot.com will be at Postgres Meetup for * to talk about pg_lake - #Postgres for #Iceberg with #DuckDB.
Join us!
www.meetup.com/postgres-mee...
Docs: github.com/Snowflake-La...
pg_lake just went open source! (Apache 2.0)
pg_lake is a set of extensions (from Crunchy Data Warehouse) that add comprehensive Iceberg support and data lake access to Postgres, with @duckdb.org transparently integrated into the query engine.
Announcement blog: www.snowflake.com/en/engineeri...
Safety vs. Flexibility quadchart for different DBMSs. VLDB 2025 https://doi.org/10.14778/3725688.3725719
No system hits the sweet spot of allowing for extensibility while maintaining systems safety. It would be nice if there was a standard plugin API (think POSIX) that allows compatibility across systems.
Thanks to @marcoslot.com + @daveandersen.bsky.social for their collaboration on this project
At last @abigalekim.bsky.social's paper is out! Its the most complete eval of DB extensions/plugins ever. We analyze PostgreSQL, MySQL, MariaDB, SQLite, DuckDB, Redis.
TLDR: Postgres extns ecosystem is fraught with footguns. Other DBMSs have fewer extns but less problems. DuckDB has cleanest API.
Five years ago I joined @crunchydata.com, shortly after I wrote about having unfinished business with Postgres. Today as part of Snowflake that journey is continuing. We've built some amazing things, but are just getting started.
www.crunchydata.com/blog/crunchy...
Recording of my Data Council talk:
www.youtube.com/watch?v=HZAr...
Generative AI comes up with details that would be hilarious, if it wasn't so mind boggling that it can come up with these details.
Thanks Gunnar!
We generate regular position delete files for merge-on-read, so any Iceberg query engine can read them. Equality deletes would be more CDC friendly, but not supported in most engines.
We have some secret sauce around how we track/know positions, but being Postgres helps a lot there.
And there it is: Native logical replication from any Postgres server to Iceberg managed by Crunchy Data Warehouse.
Speed up Postgres analytical queries 100x with 2 commands.
I gave a talk at the inaugural (and awesome) European Iceberg meetup in Amsterdam last night.
It's an introduction to how and why we used Iceberg and DuckDB to build a Postgres Data Warehouse:
www.youtube.com/watch?v=cEnq...
Move fast and build solid solutions that work across platforms.
You can now use Postgres as a modern Data Warehouse anywhere, using any S3-compatible storage API. Query, import, or export files in your data lake or store data in Iceberg with automatic maintenance and very fast queries.
Excited to announce Crunchy Data Warehouse is now available for Kubernetes and On-premises. Need faster analytics from Postgres? Want a native Postgres data lake experience? Learn more about how it works: www.crunchydata.com/blog/crunchy...
Amazing result
Would be cool if Iceberg/Parquet had support for storing JSON as vectors.
Generally by unwinding the JSON in the insert..select that processes the raw log files. JSON is broken down into columns by default, though nested JSON remains as jsonb. You can either store that directly (stored as string), unwind it manually, or convert to a composite type (stored as struct)
We weren't really thinking of log management as a target use case, but Iceberg is ideal as the final destination for logs, and having transactions & built-in job scheduling & a fast query engine (& laser focus on developer experience) makes things really simple and cost-effective.
I got a number of questions on how we saved $30k a month on cloudwatch by moving logs directly to S3/Iceberg with Postgres so I wrote up how in a bit more detail - www.crunchydata.com/blog/reducin...
Excited to announce built-in maintenance for Iceberg via Postgres.
Now within Crunchy Data Warehouse we will automatically vacuum and continuously optimize your Iceberg data by compacting and cleaning up files.
Dig into the details of how this works www.crunchydata.com/blog/automat...
Imagine your potential customer as a serious company doing serious things, and willing to pay serious money if you can genuinely help them run their business without causing lot of new problems.
Then go build products for that customer.
This works.
Auto-vacuum for #Iceberg tables is now available in Crunchy Data Warehouse!
We're always aiming for a 0-touch experience where possible, so we went out of our way to make Iceberg compaction & cleanup fully automatic without any configuration.
Still pretty interesting to see a manual vacuum:
A big part of building Crunchy Data Warehouse was ease of use. How easy is it to load data from existing public datasets?
Step 1: Point at your dataset and we'll load it for you
Step 2: Query it
Step 3: Profit
ChatGPT Plus had a good run, but looks like Le Chat is going to be my main assistant now.
I like that it's fast, to the point, and quite clever.
I was impressed with a SQL query it came up with today for finding contiguous ranges of integers. ChatGPT's version was 3x slower.
Postgres is increasingly becoming a versatile data platform, instead of just an operational database.
Using pg_parquet you can trivially export data to S3, and using Crunchy Data Warehouse you can just as easily query or import Parquet files from PostgreSQL.
Deepseek R1 in an ollama "container app" on a managed Postgres server, because... why not?
5 years from now, no one's going to want slower, less reliable, or harder to use databases.
π pg_documentdb is open source
I created the initial version with Vinod Sridharan (an absolutely brilliant engineer) at Microsoft a few years ago and it's come a long way since.
It reimplements Mongo API with exact semantics in PostgreSQL. Already used by FerretDB!
github.com/microsoft/do...
Impressed by the latest ParadeDB release.
Solving the right problems in the right way is really hard.
1/11. ParadeDB is now integrated with Postgres block storage. As far as we know, no one has integrated a search and analytics engine with Postgres storage before. This is a big deal.
Here's why we did it, how we did it, and why you should care. π§΅
A lot of great recommendations on tuning PostgreSQL for analytical queries by @karenhjex.bsky.social
www.crunchydata.com/blog/postgre...