Bogdan Gaza's Avatar

Bogdan Gaza

@b0gdang

co-founder & CTO @datologyai.com

123
Followers
85
Following
4
Posts
01.11.2024
Joined
Posts Following

Latest posts by Bogdan Gaza @b0gdang

A suburban caveman house

A suburban caveman house

Quit freaking out. Remember that in 10,000 B.C., when America had ZERO international trade, a family could afford a house like this on a single income.

04.04.2025 21:25 πŸ‘ 16542 πŸ” 1870 πŸ’¬ 446 πŸ“Œ 110
Post image

Me showing Claude what I've been working on

20.03.2025 19:02 πŸ‘ 13 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

i am sick of β€œmore monkeys jumping on the bed” discourse. it’s as though these people have no memory of 2017 when one fell off and bumped his head. doctor spoke out against it, mama endorsed doctor’s findings. i’m limiting replies to followers because i do not have the energy for YMMJOTBers today

08.03.2025 20:56 πŸ‘ 2213 πŸ” 353 πŸ’¬ 12 πŸ“Œ 3
Preview
Databases in 2024: A Year in Review Andy rises from the ashes of his dead startup and discusses what happened in 2024 in the database game.

Buckle up because we're banging into the new year with my annual retrospective of the last year in databases! Highlights include license change blowback, Databricks vs. Snowflake gangwar, @duckdb.org's shotgun weddings, and buying a quarterback to impress your lover: www.cs.cmu.edu/~pavlo/blog/...

01.01.2025 14:02 πŸ‘ 199 πŸ” 64 πŸ’¬ 10 πŸ“Œ 19

The new family Christmas Eve tradition: watching Verandah Santa and The Sign episodes from Bluey!

25.12.2024 03:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

See you at AWS re:Invent next week! If you're in Vegas happy to catch up on anything data curation related!

01.12.2024 17:08 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Words have no meaning anymore.

26.11.2024 15:44 πŸ‘ 122 πŸ” 17 πŸ’¬ 8 πŸ“Œ 0

I am excited about the release of our results on web-scale text data curation @datologyai.com. Our curation pipeline transforms the RedPajama V1 dataset into the DAIT dataset which outperforms the best publicly-available pretraining datasets for training LLMs better, faster, smaller.

25.11.2024 19:46 πŸ‘ 7 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

Tired: Bringing up politics at Thanksgiving

Wired: Bringing up @datologyai.com’s new text curation results at Thanksgiving

That’s right, we applied our data curation pipeline to text pretraining data and the results are hot enough to roast a πŸ¦ƒ
🧡

25.11.2024 17:49 πŸ‘ 18 πŸ” 4 πŸ’¬ 1 πŸ“Œ 6
Preview
DatologyAI Starter Pack Join the conversation

If you're interested in Data-Centric AI, follow The DatologyAI Starter Pack for damn-good data memes and occasional data curation insights: go.bsky.app/NJ9sTot

22.11.2024 20:04 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Amazon S3 Express One Zone now supports the ability to append data to an object This is a first for Amazon S3: it is now possible to append data to an existing object in a bucket, where previously the only supported operation was to atomically …

Amazon S3 just grew "append"! It's only available for the more expensive, lower latency S3 Express One Zone bucket class but you can now append data to an object up to 10,000 times - previously you could only atomically replace a whole object with an updated version simonwillison.net/2024/Nov/22/...

22.11.2024 04:47 πŸ‘ 193 πŸ” 32 πŸ’¬ 6 πŸ“Œ 11

This is the most interesting and most impactful data pipeline problem I have ever worked on (and if you know me, you know that’s saying something.)

So happy to be able to share this work with the world! And now it’s time for a little vacation. πŸ˜…

14.11.2024 19:21 πŸ‘ 26 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Post image

🧡We’ve spent the last few months at @datologyai.bsky.social
building a state-of-the-art data curation pipeline and I’m SO excited to share our first results: we curated image-text pretraining data and massively improved CLIP model quality, training speed, and inference efficiency πŸ”₯πŸ”₯πŸ”₯

14.11.2024 17:14 πŸ‘ 28 πŸ” 6 πŸ’¬ 2 πŸ“Œ 10

Web-Scale Data Curation is a frontier challenge - I'm excited to show the progress we've made in just 6 months @datologyai

tl;dr: we've pretrained the most data-efficient and best-in-class CLIP models!

Read on to see how our product powers multimodal data curation
1/n 🧡

14.11.2024 17:30 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Oh finagle from twitter was so good at this!

13.11.2024 20:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think AWS can be a great place to get GPUs fromβ€”we get them from them, and we had a great time. It depends on how many and what time frames you are looking at.

12.11.2024 19:56 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0