Highly highly recommend attending if you're in the Vancouver area (or want to travel to it!). Stu will be going deep on how ParadeDB is architected from the ground up.
Highly highly recommend attending if you're in the Vancouver area (or want to travel to it!). Stu will be going deep on how ParadeDB is architected from the ground up.
And just like that, 100 contributors to ParadeDB. Thank you to all who have contributed, and here's to 100 more!
I don't like advertising third-party products, but my testimonial is up on runs-on.com/testimonials/, and it's worth a look. @crohr has made something amazing.
We need an easy way to run benchmarks on bare-metal nodes via GitHub Actions, and nothing else compares. 12/10 productπ
4/4. I've long said that search engines and columnar databases have a lot in common... We'll likely write more about this in the future, but in the meantime, I recommend checking out the blog.
And of course don't hesitate to star the project: github.com/paradedb/par...
3/4. To accomplish this, we use two core technologies:
- Columnar storage, for fast, cache-friendly lookups
- Block WAND, which enables early pruning when doing BM25 scoring
Neither are native to Postgres, but ParadeDB supports them in our BM25 index thanks to Tantivy.
2/4. Top K queries are the core workload of any search engine. Whether you use Google Search as a consumer or a complex B2B search engine in a SaaS product, the core goal of search is "find me the K most relevant results." That's Top K optimization.
1/4. The simplest questions are often the hardest to answer. In this technical blog, our CTO dives into how we optimized Top K queries in ParadeDB.
I *highly* recommend reading. But if you need some more convincing, here's why this matters π§΅
ParadeDB is at 99 contributors. Who will be the 100th? π
Exciting project built with ParadeDB on the front page of HN today! github.com/getomnico/omni
We now support Nix/NixOS thanks to the incredible work of Luc Perkins!
It is now possible to reference the ParadeDB Skill directly via npx:
I've moved to San Francisco! Every time I visit SF, I'm struck by how excited and full of ideas people here always are. The optimism is contagious. It was time to join in on the fun!
If you're around the Bay Area and want to chat databases, hit me up. βοΈ
Claude Code is now an expert at ParadeDB code. This is the next step in our current push to integrate with the ecosystem.
Most of our largest customers already use this skill to accelerate their development with ParadeDB. Give it a try, feedback welcome.
ParadeDB is hiring someone to help build integrations. ORMs, RAG frameworks, PaaS, etc. Remote within US/Canada timezones, preference for PST. You'll work directly with me.
It's the perfect time to join, just ahead of the 1.0 later this year.
The movement of rewriting old Python tools in Rust is phenomenal for development velocity. pre-commit --> prek, and so many others
If you use LangChain / etc. RAG frameworks today, I'd love to ask you a few questions! DMs open
I am blown away at how much work a small, tight-knit and competent team can accomplish nowadays. The world has truly shifted from "time" to "taste" as the limited currency when striving to do great work
High-quality search is more than keyword matching. Personalization is what takes search from good to great.
We're investing a lot in building the "unified retrieval stack" for Postgres this year. Expect lots of announcements.
For now, here's how to build personalization today.
Looking to hire/contract someone who is experienced making Kubernetes operator for a ~few weeks project. DMs open
6/6. Highly encourage reading this post; it's a particularly interesting one.
And as always, don't hesitate to give us a β github.com/paradedb/par.... Back to work!
5/6. Result --> 15X faster aggregates in Postgres, all native. This blog post goes over the bucketing use case, but another one will follow, showcasing how regular COUNT(*) queries get accelerated the same way. Pretty powerful for:
- Search
- Live dashboards
- RAG
- and more!
4/6. To enable our facets to be executed in a single index pass against our columnar index, we leveraged two features of Postgres:
- Planner hooks (to intercept the aggregate before Postgres gets it)
- Custom scans (to re-route it to our columnar index/Tantivy)
3/6. BUT! That's one of the main features of a search engine. "Give me some results, tell me how many results there are, and categorize them." So we had to do something about it.
To solve this, we've leveraged ParadeDB's columnar index. Faceted search is essentially a type of aggregate, after all.
2/6. Some time ago, there was a project from Cybertec about building better faceting in Postgres: github.com/cybertec-pos...
Needless to say, the interest is there. If you've ever run a COUNT query in Postgres, you know it does not scale well *at all*.
1/6. Faceted search (read: aggregates) in Postgres, now at the speed of columnar.
We just released a blog post detailing how we built fast aggregates (COUNT, bucketing, etc.) inside Postgres. This has been one of our most requested features, and we're excited to take you through the journey. π§΅
5/5. Together, these changes have made using ParadeDB (we think) significantly easier. You can check it out at docs.paradedb.com. We'd love to hear your feedback on the new API.
And of course, don't hesitate to give us a star β
4/5. We came up with two ideas:
- Tokenizers as Postgres types. This makes tokenizers a first-class construct in Postgres.
- Custom operators. Inspired by pgvector, we changed a few of our functions to be custom Postgres operators.
3/5. As our adoption grew, we noticed two things:
- Defining tokenizers (a critical component of full-text search) was confusing
- ORM integrations were difficult
Developer experience is critical to any database, and so we had to fix this.
2/5. Our initial API was JSON-based. Inside Postgres, that felt... odd. But there was a reason.
Elastic is JSON-based, and so are Lucene and Tantivy (our underlying search engine). We focused on exposing powerful search features, but not so much on their ergonomics.
1/5. We just released V2 of the ParadeDB API. Elastic-quality search and aggregates, SQL-native, and ORM-friendly. Here's how we got there: