Trending
Abhimanyu Hans's Avatar

Abhimanyu Hans

@ahans30

PhD Student @umdcs https://ahans30.github.io/

24
Followers
77
Following
2
Posts
24.11.2024
Joined
Posts Following

Latest posts by Abhimanyu Hans @ahans30

Let’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't.

The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. 🧵

29.01.2025 17:12 👍 10 🔁 2 💬 1 📌 0

Absolutely!

09.12.2024 05:47 👍 1 🔁 0 💬 0 📌 0
Post image

poster sent for print 😮‍💨

are you concerned your LLM might regurgitate exact training data to your users?

join me and my co-authors at #NeurIPS2024 on wed's 1st poster session & learn how goldfish loss can help you.

eager to meet friends from past and future!

p.s. hmu if you hiring summer intern!

09.12.2024 04:17 👍 6 🔁 0 💬 1 📌 0