(@joerocca) — KonKok

Visualization of domains for which TD-MPC2 has been applied, including locomotion, manipulation, dexterous hands, humanoids, autonomous racing.

I finally joined 🦋! Some of you may recognize me from other sites. Here's a quick intro for new connections:

👋 I work on RL, world models, and generalization in decision-making. I'm perhaps most well known for my work on "TD-MPC2: Scalable, Robust World Models for Continuous Control" www.tdmpc2.com

21.02.2025 21:11 👍 38 🔁 4 💬 4 📌 0

Small models? Saturating? Where I live we don't know theses words.

22.04.2025 18:09 👍 21 🔁 1 💬 3 📌 0

New Open-source reasoning model (code, dataset, and model)!

Huginn-0125: Pretraining a Depth-Recurrent Model

Train a recurrent-depth model at scale on 4096 AMD GPUs on Frontier.

10.02.2025 18:35 👍 18 🔁 4 💬 1 📌 0

Zyphra beta releases Zonos, a highly expressive TTS model with high fidelity voice cloning.

They release both transformer and SSM-hybrid models under an Apache 2.0 license.

10.02.2025 18:44 👍 21 🔁 5 💬 2 📌 0

Physical Intelligence (π) Open Sourcing π0

They are releasing the code and weights for the π0 as part of our experimental openpi repository.

Blog: www.pi.website/blog/openpi
Repo: github.com/Physical-Int...

05.02.2025 07:22 👍 23 🔁 5 💬 3 📌 0

⭐ The first foundational model available on @LeRobotHF ⭐

Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior.

It was trained by @physical_int and ported to pytorch by @m_olbap
👇🧵

04.02.2025 17:07 👍 68 🔁 15 💬 5 📌 3

When it rains, it pours.

Baichuan releases Baichuan-Omni-1.5

Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs.

Both model ( huggingface.co/baichuan-inc... ) and base ( huggingface.co/baichuan-inc... ).

26.01.2025 21:14 👍 26 🔁 4 💬 2 📌 0

Latest #AI benchmark results: DeepSeek-R1 (including its distilled variants) outperforms OpenAI's o1-mini and preview models. And the Llama 3 distilled version now holds the title of the highest-performing LLM I've tested locally to date. 🚀

24.01.2025 12:22 👍 4 🔁 1 💬 0 📌 0

Tsconfig option to disallow features requiring transformations which are not supported by Node.js' --strip-types · Issue #59601 · microsoft/TypeScript 🔍 Search Terms --strip-types ✅ Viability Checklist This wouldn't be a breaking change in existing TypeScript/JavaScript code This wouldn't change the runtime behavior of existing JavaScript code Th...

TypeScript excitement 😉

Thanks to @searyanc.dev for landing the new --erasableSyntaxOnly tsconfig flag. Heading for TS 5.8 Beta next week 🎉

🔷 Guides users away from TS-only runtime features such as enum & namespace

🔷 Pairs nicely with Node's recent TypeScript support

github.com/microsoft/Ty...

24.01.2025 10:35 👍 118 🔁 22 💬 6 📌 5

Hugging Face's GRPO to TRL - the training algorithm behind DeepSeek R1

🔋Eliminates the value function from PPO to save boatloads of compute
💰 Samples N completions per prompt to compute average rewards across a group

To use it, run:

pip install git+https://github.com/huggingface/trl.git

23.01.2025 03:20 👍 8 🔁 1 💬 0 📌 0

Prime Intellect releases:

- INTELLECT-MATH, a frontier 7B parameter model for math reasoning that shows that the quality of your SFT initialization strongly impacts reinforcement learning.

Blog: www.primeintellect.ai/blog/intelle... Models: huggingface.co/PrimeIntelle...

22.01.2025 03:20 👍 9 🔁 1 💬 1 📌 0

We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.

Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks 📈:
• AIME: 73.3%
• GPQA: 74.2%
• MMMU: 75.4%

22.01.2025 00:31 👍 158 🔁 30 💬 8 📌 6

SambaNova's EvaByte

The open-weight tokenizer-free language model. Their 6.5B byte-level LM—-EvaByte matches modern tokenizer-based LMs with 5x less data & 2x faster decoding!

22.01.2025 02:45 👍 14 🔁 2 💬 2 📌 0

ByteDance's UI-TARS, which can operate on your local personal device.

Project: github.com/bytedance/UI...
Desktop: github.com/bytedance/UI...
Browser: github.com/web-infra-de...
Models : huggingface.co/bytedance-re...
Paper: arxiv.org/abs/2501.12326

22.01.2025 06:55 👍 32 🔁 3 💬 1 📌 3

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by 🤗 Transformers.js. WebGPU support coming soon!

👉 npm i kokoro-js 👈

Link to demo (+ sample code) in 🧵

16.01.2025 15:05 👍 19 🔁 3 💬 1 📌 0

DeepSeek-R1 is coming soon.

DeepSeek-R1 (Preview) Results. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance on LiveCodeBench.

17.01.2025 19:31 👍 21 🔁 1 💬 0 📌 1

New sharing step on our journey towards easy-to-use fully-open models.

16.01.2025 10:44 👍 15 🔁 7 💬 0 📌 0

📢 Paper + code release 📃💻

After 2 years of work, I'm excited to announce our newest paper, MatterGen, has been published in Nature!
www.nature.com/articles/s41...

We are also releasing all the training data, model weights, model code, and evaluation code on GitHub!
github.com/microsoft/ma...

16.01.2025 10:15 👍 79 🔁 21 💬 2 📌 1

TinyBVH has been updated to 1.2.5 on main. New:
TLAS/BLAS construction and traversal, for single and double precision BVHs, and including a brand new GPU demo: See the attached real-time footage, captured at 1280x720 on an NVIDIA 2070 laptop GPU.
#RTXoff
github.com/jbikker/tiny...

16.01.2025 13:26 👍 49 🔁 8 💬 3 📌 0

InternLM v3

- Performance surpasses models like Llama3.1-8B and Qwen2.5-7B
- Capable of deep reasoning with system prompts
- Trained only on 4T high-quality tokens

huggingface.co/collections/...

15.01.2025 08:24 👍 18 🔁 7 💬 2 📌 0

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

13.01.2025 19:53 👍 70 🔁 18 💬 4 📌 5

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers 🕺🏻💃🏻

🔖 Model collection: huggingface.co/collections/...

🔖 Notebook on how to use: colab.research.google.com/drive/1e8fcb...

🔖 Try it here: huggingface.co/spaces/hysts...

09.01.2025 14:27 👍 67 🔁 8 💬 1 📌 0

Goodbye WinterCG, welcome WinterTC WinterCG, the Web Interoperable Runtimes Community Group is moving to ECMA as TC55 to be able to publish standards.

Deno is committed to web standards - that's why we co-founded WinterCG two years ago. Today marks the next step in that journey: WinterCG moves to Ecma International as technical comittee 55 (TC55).

Goodbye WinterCG, welcome WinterTC!

deno.com/blog/wintertc

10.01.2025 14:06 👍 160 🔁 41 💬 1 📌 4

Screenshot of the dataset on the Hugging Face Hub

🔍 Massive human feedback dataset for text-to-image models from RapidData
- 1.5M human responses from 152K participants
- Evaluates image coherence, style & prompt alignment
- Includes detailed error heatmaps
- Covers DALL-E, Midjourney, Imagen outputs
Available on @hf.co

09.01.2025 14:00 👍 12 🔁 2 💬 1 📌 0

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗

The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

09.01.2025 12:00 👍 59 🔁 8 💬 3 📌 2

microsoft/phi-4

phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets.

huggingface.co/microsoft/ph...

08.01.2025 16:33 👍 28 🔁 6 💬 2 📌 2

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵

10.12.2024 13:09 👍 47 🔁 15 💬 3 📌 5

Latest posts by @joerocca