Michal Wolski (@michalwols)

🐳 Some notes on "DeepSeek and export control" Finally took time to go over Dario's essay on DeepSeek and export control and wrote some notes. I mostly disagree and I think it missed the point.

I wrote some reflections on DeepSeek, open-source, AI, US and China, starting from Dario's recent essay calling for stronger export controls.

I mostly disagree with his essay and think it missed the point

You can read it here: thomwolf.io/blog/deepsee...

01.02.2025 15:07 👍 52 🔁 9 💬 2 📌 0

Yeah I just attempted that a few weeks ago and ended up bricking it. Was fun reinstalling everything and dealing with a bunch of broken drivers.

26.11.2024 13:48 👍 1 🔁 0 💬 0 📌 0

All the things you need to know to pretrain an LLM at home*!

Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.

*assuming your home includes an H100 cluster

19.11.2024 20:35 👍 77 🔁 9 💬 5 📌 0

Deep global descriptors give a convenient way for retrieval, but local descriptors are a game changer in finding needles in a haystack (particular objects in clutter). Due to their high cost, with AMES we optimize the performance/memory trade-off during re-ranking. #ECCV2024

20.11.2024 21:14 👍 32 🔁 8 💬 1 📌 0

Nvidia's Hymba - an efficient small language model with hybrid architecture.

Their architecture consists of Hymba hybrid blocks, with Mamba and Attention connected in parallel. They found this design to be more effective in disentangling attention into linear and non-linear components.

22.11.2024 05:41 👍 32 🔁 3 💬 1 📌 1

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding 🧵

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

22.11.2024 08:32 👍 59 🔁 19 💬 3 📌 1

Hymba: A Hybrid-head Architecture for Small Language Models We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficienc...

Hymba is a hybrid-head architecture for small language models, integrating transformer and state space mechanisms.

It beats llama 3b with only 1.5b parameters

arxiv.org/abs/2411.13676

22.11.2024 11:17 👍 6 🔁 1 💬 0 📌 0

Michal Wolski

Latest posts by Michal Wolski @michalwols