Trending
Michal Wolski's Avatar

Michal Wolski

@michalwols

Interested in large vision models Funemployed, prev founder at bite.ai (acquired by MyFitnessPal), principal ml eng at MFP, 1st employee at Clarifai, ML at Columbia, NYU future labs

26
Followers
132
Following
1
Posts
22.11.2024
Joined
Posts Following

Latest posts by Michal Wolski @michalwols

๐Ÿณ Some notes on "DeepSeek and export control" Finally took time to go over Dario's essay on DeepSeek and export control and wrote some notes. I mostly disagree and I think it missed the point.

I wrote some reflections on DeepSeek, open-source, AI, US and China, starting from Dario's recent essay calling for stronger export controls.

I mostly disagree with his essay and think it missed the point

You can read it here: thomwolf.io/blog/deepsee...

01.02.2025 15:07 ๐Ÿ‘ 52 ๐Ÿ” 9 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

Yeah I just attempted that a few weeks ago and ended up bricking it. Was fun reinstalling everything and dealing with a bunch of broken drivers.

26.11.2024 13:48 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image Post image Post image Post image

All the things you need to know to pretrain an LLM at home*!

Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.

*assuming your home includes an H100 cluster

19.11.2024 20:35 ๐Ÿ‘ 77 ๐Ÿ” 9 ๐Ÿ’ฌ 5 ๐Ÿ“Œ 0
Post image

Deep global descriptors give a convenient way for retrieval, but local descriptors are a game changer in finding needles in a haystack (particular objects in clutter). Due to their high cost, with AMES we optimize the performance/memory trade-off during re-ranking. #ECCV2024

20.11.2024 21:14 ๐Ÿ‘ 32 ๐Ÿ” 8 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Nvidia's Hymba - an efficient small language model with hybrid architecture.

Their architecture consists of Hymba hybrid blocks, with Mamba and Attention connected in parallel. They found this design to be more effective in disentangling attention into linear and non-linear components.

22.11.2024 05:41 ๐Ÿ‘ 32 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

๐——๐—ผ๐—ฒ๐˜€ ๐—ฎ๐˜‚๐˜๐—ผ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ฝ๐—ฟ๐—ฒ-๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐˜„๐—ผ๐—ฟ๐—ธ ๐—ณ๐—ผ๐—ฟ ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป? ๐Ÿค”
Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding ๐Ÿงต

paper: arxiv.org/abs/2411.14402
code: github.com/apple/ml-aim
HF: huggingface.co/collections/...

22.11.2024 08:32 ๐Ÿ‘ 59 ๐Ÿ” 19 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 1
Preview
Hymba: A Hybrid-head Architecture for Small Language Models We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficienc...

Hymba is a hybrid-head architecture for small language models, integrating transformer and state space mechanisms.

It beats llama 3b with only 1.5b parameters

arxiv.org/abs/2411.13676

22.11.2024 11:17 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0