Visualization of domains for which TD-MPC2 has been applied, including locomotion, manipulation, dexterous hands, humanoids, autonomous racing.
I finally joined π¦! Some of you may recognize me from other sites. Here's a quick intro for new connections:
π I work on RL, world models, and generalization in decision-making. I'm perhaps most well known for my work on "TD-MPC2: Scalable, Robust World Models for Continuous Control" www.tdmpc2.com
21.02.2025 21:11
π 38
π 4
π¬ 4
π 0
Small models? Saturating? Where I live we don't know theses words.
22.04.2025 18:09
π 21
π 1
π¬ 3
π 0
New Open-source reasoning model (code, dataset, and model)!
Huginn-0125: Pretraining a Depth-Recurrent Model
Train a recurrent-depth model at scale on 4096 AMD GPUs on Frontier.
10.02.2025 18:35
π 18
π 4
π¬ 1
π 0
Zyphra beta releases Zonos, a highly expressive TTS model with high fidelity voice cloning.
They release both transformer and SSM-hybrid models under an Apache 2.0 license.
10.02.2025 18:44
π 21
π 5
π¬ 2
π 0
Physical Intelligence (Ο) Open Sourcing Ο0
They are releasing the code and weights for the Ο0 as part of our experimental openpi repository.
Blog: www.pi.website/blog/openpi
Repo: github.com/Physical-Int...
05.02.2025 07:22
π 23
π 5
π¬ 3
π 0
β The first foundational model available on @LeRobotHF β
Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior.
It was trained by @physical_int and ported to pytorch by @m_olbap
ππ§΅
04.02.2025 17:07
π 68
π 15
π¬ 5
π 3
When it rains, it pours.
Baichuan releases Baichuan-Omni-1.5
Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs.
Both model ( huggingface.co/baichuan-inc... ) and base ( huggingface.co/baichuan-inc... ).
26.01.2025 21:14
π 26
π 4
π¬ 2
π 0
Latest #AI benchmark results: DeepSeek-R1 (including its distilled variants) outperforms OpenAI's o1-mini and preview models. And the Llama 3 distilled version now holds the title of the highest-performing LLM I've tested locally to date. π
24.01.2025 12:22
π 4
π 1
π¬ 0
π 0
Hugging Face's GRPO to TRL - the training algorithm behind DeepSeek R1
πEliminates the value function from PPO to save boatloads of compute
π° Samples N completions per prompt to compute average rewards across a group
To use it, run:
pip install git+https://github.com/huggingface/trl.git
23.01.2025 03:20
π 8
π 1
π¬ 0
π 0
Prime Intellect releases:
- INTELLECT-MATH, a frontier 7B parameter model for math reasoning that shows that the quality of your SFT initialization strongly impacts reinforcement learning.
Blog: www.primeintellect.ai/blog/intelle... Models: huggingface.co/PrimeIntelle...
22.01.2025 03:20
π 9
π 1
π¬ 1
π 0
Weβve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.
Today weβre sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks π:
β’ AIME: 73.3%
β’ GPQA: 74.2%
β’ MMMU: 75.4%
22.01.2025 00:31
π 158
π 30
π¬ 8
π 6
SambaNova's EvaByte
The open-weight tokenizer-free language model. Their 6.5B byte-level LMβ-EvaByte matches modern tokenizer-based LMs with 5x less data & 2x faster decoding!
22.01.2025 02:45
π 14
π 2
π¬ 2
π 0
ByteDance's UI-TARS, which can operate on your local personal device.
Project: github.com/bytedance/UI...
Desktop: github.com/bytedance/UI...
Browser: github.com/web-infra-de...
Models : huggingface.co/bytedance-re...
Paper: arxiv.org/abs/2501.12326
22.01.2025 06:55
π 32
π 3
π¬ 1
π 3
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by π€ Transformers.js. WebGPU support coming soon!
π npm i kokoro-js π
Link to demo (+ sample code) in π§΅
16.01.2025 15:05
π 19
π 3
π¬ 1
π 0
DeepSeek-R1 is coming soon.
DeepSeek-R1 (Preview) Results. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance on LiveCodeBench.
17.01.2025 19:31
π 21
π 1
π¬ 0
π 1
New sharing step on our journey towards easy-to-use fully-open models.
16.01.2025 10:44
π 15
π 7
π¬ 0
π 0
π’ Paper + code release ππ»
After 2 years of work, I'm excited to announce our newest paper, MatterGen, has been published in Nature!
www.nature.com/articles/s41...
We are also releasing all the training data, model weights, model code, and evaluation code on GitHub!
github.com/microsoft/ma...
16.01.2025 10:15
π 79
π 21
π¬ 2
π 1
TinyBVH has been updated to 1.2.5 on main. New:
TLAS/BLAS construction and traversal, for single and double precision BVHs, and including a brand new GPU demo: See the attached real-time footage, captured at 1280x720 on an NVIDIA 2070 laptop GPU.
#RTXoff
github.com/jbikker/tiny...
16.01.2025 13:26
π 49
π 8
π¬ 3
π 0
InternLM v3
- Performance surpasses models like Llama3.1-8B and Qwen2.5-7B
- Capable of deep reasoning with system prompts
- Trained only on 4T high-quality tokens
huggingface.co/collections/...
15.01.2025 08:24
π 18
π 7
π¬ 2
π 0
Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social
13.01.2025 19:53
π 70
π 18
π¬ 4
π 5
ViTPose -- best open-source pose estimation model just landed to @hf.co transformers πΊπ»ππ»
π Model collection: huggingface.co/collections/...
π Notebook on how to use: colab.research.google.com/drive/1e8fcb...
π Try it here: huggingface.co/spaces/hysts...
09.01.2025 14:27
π 67
π 8
π¬ 1
π 0
Goodbye WinterCG, welcome WinterTC
WinterCG, the Web Interoperable Runtimes Community Group is moving to ECMA as TC55 to be able to publish standards.
Deno is committed to web standards - that's why we co-founded WinterCG two years ago. Today marks the next step in that journey: WinterCG moves to Ecma International as technical comittee 55 (TC55).
Goodbye WinterCG, welcome WinterTC!
deno.com/blog/wintertc
10.01.2025 14:06
π 160
π 41
π¬ 1
π 4
Screenshot of the dataset on the Hugging Face Hub
π Massive human feedback dataset for text-to-image models from RapidData
- 1.5M human responses from 152K participants
- Evaluates image coherence, style & prompt alignment
- Includes detailed error heatmaps
- Covers DALL-E, Midjourney, Imagen outputs
Available on @hf.co
09.01.2025 14:00
π 12
π 2
π¬ 1
π 0
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license π
The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos β―οΈ
09.01.2025 12:00
π 59
π 8
π¬ 3
π 2
microsoft/phi-4
phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets.
huggingface.co/microsoft/ph...
08.01.2025 16:33
π 28
π 6
π¬ 2
π 2
Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport π₯
π arxiv.org/abs/2410.23054
π οΈ github.com/apple/ml-act
0/9 π§΅
10.12.2024 13:09
π 47
π 15
π¬ 3
π 5