Trending
's Avatar

@joerocca

Low-alpha lurking/reposting account. Interested in OSS ML, web, XR, EA (esp WAS/WAW), alt proteins, housing, aging, and stuff like that

244
Followers
3,027
Following
49
Posts
12.09.2023
Joined
Posts Following

Latest posts by @joerocca

Visualization of domains for which TD-MPC2 has been applied, including locomotion, manipulation, dexterous hands, humanoids, autonomous racing.

Visualization of domains for which TD-MPC2 has been applied, including locomotion, manipulation, dexterous hands, humanoids, autonomous racing.

I finally joined πŸ¦‹! Some of you may recognize me from other sites. Here's a quick intro for new connections:

πŸ‘‹ I work on RL, world models, and generalization in decision-making. I'm perhaps most well known for my work on "TD-MPC2: Scalable, Robust World Models for Continuous Control" www.tdmpc2.com

21.02.2025 21:11 πŸ‘ 38 πŸ” 4 πŸ’¬ 4 πŸ“Œ 0
Post image

Small models? Saturating? Where I live we don't know theses words.

22.04.2025 18:09 πŸ‘ 21 πŸ” 1 πŸ’¬ 3 πŸ“Œ 0
Post image

New Open-source reasoning model (code, dataset, and model)!

Huginn-0125: Pretraining a Depth-Recurrent Model

Train a recurrent-depth model at scale on 4096 AMD GPUs on Frontier.

10.02.2025 18:35 πŸ‘ 18 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Post image

Zyphra beta releases Zonos, a highly expressive TTS model with high fidelity voice cloning.

They release both transformer and SSM-hybrid models under an Apache 2.0 license.

10.02.2025 18:44 πŸ‘ 21 πŸ” 5 πŸ’¬ 2 πŸ“Œ 0
Video thumbnail

Physical Intelligence (Ο€) Open Sourcing Ο€0

They are releasing the code and weights for the Ο€0 as part of our experimental openpi repository.

Blog: www.pi.website/blog/openpi
Repo: github.com/Physical-Int...

05.02.2025 07:22 πŸ‘ 23 πŸ” 5 πŸ’¬ 3 πŸ“Œ 0
Video thumbnail

⭐ The first foundational model available on @LeRobotHF ⭐

Pi0 is the most advanced Vision Language Action model. It takes natural language commands as input and directly output autonomous behavior.

It was trained by @physical_int and ported to pytorch by @m_olbap
πŸ‘‡πŸ§΅

04.02.2025 17:07 πŸ‘ 68 πŸ” 15 πŸ’¬ 5 πŸ“Œ 3
Post image

When it rains, it pours.

Baichuan releases Baichuan-Omni-1.5

Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs.

Both model ( huggingface.co/baichuan-inc... ) and base ( huggingface.co/baichuan-inc... ).

26.01.2025 21:14 πŸ‘ 26 πŸ” 4 πŸ’¬ 2 πŸ“Œ 0
Post image

Latest #AI benchmark results: DeepSeek-R1 (including its distilled variants) outperforms OpenAI's o1-mini and preview models. And the Llama 3 distilled version now holds the title of the highest-performing LLM I've tested locally to date. πŸš€

24.01.2025 12:22 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Tsconfig option to disallow features requiring transformations which are not supported by Node.js' --strip-types Β· Issue #59601 Β· microsoft/TypeScript πŸ” Search Terms --strip-types βœ… Viability Checklist This wouldn't be a breaking change in existing TypeScript/JavaScript code This wouldn't change the runtime behavior of existing JavaScript code Th...

TypeScript excitement πŸ˜‰

Thanks to @searyanc.dev for landing the new --erasableSyntaxOnly tsconfig flag. Heading for TS 5.8 Beta next week πŸŽ‰

πŸ”· Guides users away from TS-only runtime features such as enum & namespace

πŸ”· Pairs nicely with Node's recent TypeScript support

github.com/microsoft/Ty...

24.01.2025 10:35 πŸ‘ 118 πŸ” 22 πŸ’¬ 6 πŸ“Œ 5
Post image

Hugging Face's GRPO to TRL - the training algorithm behind DeepSeek R1

πŸ”‹Eliminates the value function from PPO to save boatloads of compute
πŸ’° Samples N completions per prompt to compute average rewards across a group

To use it, run:

pip install git+https://github.com/huggingface/trl.git

23.01.2025 03:20 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Prime Intellect releases:

- INTELLECT-MATH, a frontier 7B parameter model for math reasoning that shows that the quality of your SFT initialization strongly impacts reinforcement learning.

Blog: www.primeintellect.ai/blog/intelle... Models: huggingface.co/PrimeIntelle...

22.01.2025 03:20 πŸ‘ 9 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.

Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks πŸ“ˆ:
β€’ AIME: 73.3%
β€’ GPQA: 74.2%
β€’ MMMU: 75.4%

22.01.2025 00:31 πŸ‘ 158 πŸ” 30 πŸ’¬ 8 πŸ“Œ 6
Post image

SambaNova's EvaByte

The open-weight tokenizer-free language model. Their 6.5B byte-level LMβ€”-EvaByte matches modern tokenizer-based LMs with 5x less data & 2x faster decoding!

22.01.2025 02:45 πŸ‘ 14 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Post image

ByteDance's UI-TARS, which can operate on your local personal device.

Project: github.com/bytedance/UI...
Desktop: github.com/bytedance/UI...
Browser: github.com/web-infra-de...
Models : huggingface.co/bytedance-re...
Paper: arxiv.org/abs/2501.12326

22.01.2025 06:55 πŸ‘ 32 πŸ” 3 πŸ’¬ 1 πŸ“Œ 3
Video thumbnail

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by πŸ€— Transformers.js. WebGPU support coming soon!

πŸ‘‰ npm i kokoro-js πŸ‘ˆ

Link to demo (+ sample code) in 🧡

16.01.2025 15:05 πŸ‘ 19 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Post image

DeepSeek-R1 is coming soon.

DeepSeek-R1 (Preview) Results. The model performs in the vicinity of o1-Medium providing SOTA reasoning performance on LiveCodeBench.

17.01.2025 19:31 πŸ‘ 21 πŸ” 1 πŸ’¬ 0 πŸ“Œ 1

New sharing step on our journey towards easy-to-use fully-open models.

16.01.2025 10:44 πŸ‘ 15 πŸ” 7 πŸ’¬ 0 πŸ“Œ 0

πŸ“’ Paper + code release πŸ“ƒπŸ’»

After 2 years of work, I'm excited to announce our newest paper, MatterGen, has been published in Nature!
www.nature.com/articles/s41...

We are also releasing all the training data, model weights, model code, and evaluation code on GitHub!
github.com/microsoft/ma...

16.01.2025 10:15 πŸ‘ 79 πŸ” 21 πŸ’¬ 2 πŸ“Œ 1
Video thumbnail

TinyBVH has been updated to 1.2.5 on main. New:
TLAS/BLAS construction and traversal, for single and double precision BVHs, and including a brand new GPU demo: See the attached real-time footage, captured at 1280x720 on an NVIDIA 2070 laptop GPU.
#RTXoff
github.com/jbikker/tiny...

16.01.2025 13:26 πŸ‘ 49 πŸ” 8 πŸ’¬ 3 πŸ“Œ 0
Post image

InternLM v3

- Performance surpasses models like Llama3.1-8B and Qwen2.5-7B
- Capable of deep reasoning with system prompts
- Trained only on 4T high-quality tokens

huggingface.co/collections/...

15.01.2025 08:24 πŸ‘ 18 πŸ” 7 πŸ’¬ 2 πŸ“Œ 0
Post image

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

13.01.2025 19:53 πŸ‘ 70 πŸ” 18 πŸ’¬ 4 πŸ“Œ 5
Video thumbnail

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers πŸ•ΊπŸ»πŸ’ƒπŸ»

πŸ”– Model collection: huggingface.co/collections/...

πŸ”– Notebook on how to use: colab.research.google.com/drive/1e8fcb...

πŸ”– Try it here: huggingface.co/spaces/hysts...

09.01.2025 14:27 πŸ‘ 67 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0
Preview
Goodbye WinterCG, welcome WinterTC WinterCG, the Web Interoperable Runtimes Community Group is moving to ECMA as TC55 to be able to publish standards.

Deno is committed to web standards - that's why we co-founded WinterCG two years ago. Today marks the next step in that journey: WinterCG moves to Ecma International as technical comittee 55 (TC55).

Goodbye WinterCG, welcome WinterTC!

deno.com/blog/wintertc

10.01.2025 14:06 πŸ‘ 160 πŸ” 41 πŸ’¬ 1 πŸ“Œ 4
Screenshot of the dataset on the Hugging Face Hub

Screenshot of the dataset on the Hugging Face Hub

πŸ” Massive human feedback dataset for text-to-image models from RapidData
- 1.5M human responses from 152K participants
- Evaluates image coherence, style & prompt alignment
- Includes detailed error heatmaps
- Covers DALL-E, Midjourney, Imagen outputs
Available on @hf.co

09.01.2025 14:00 πŸ‘ 12 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license πŸ’—

The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

09.01.2025 12:00 πŸ‘ 59 πŸ” 8 πŸ’¬ 3 πŸ“Œ 2
Post image

microsoft/phi-4

phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets.

huggingface.co/microsoft/ph...

08.01.2025 16:33 πŸ‘ 28 πŸ” 6 πŸ’¬ 2 πŸ“Œ 2
Post image

Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport πŸ”₯

πŸ“„ arxiv.org/abs/2410.23054
πŸ› οΈ github.com/apple/ml-act

0/9 🧡

10.12.2024 13:09 πŸ‘ 47 πŸ” 15 πŸ’¬ 3 πŸ“Œ 5