You can play directly with the model via this HF space:
huggingface.co/spaces/Qwen/...
You can play directly with the model via this HF space:
huggingface.co/spaces/Qwen/...
Qwen released QvQ 72B OpenAI o1 like reasoning model on Hugging Face with Vision capabilities - beating GPT4o, Claude Sonnet 3.5 ๐ฅ
Chat with it live for free here:
huggingface.co/chat/models/...
3.1 70B vs 3.3 70B:
Code Generation
> HumanEval: 80.5% โ 88.4% (+7.9%)
> MBPP EvalPlus: 86.0% โ 87.6% (+1.6%)
Steerability
> IFEval: 87.5% โ 92.1% (+4.6%)
Reasoning & Math
> GPQA Diamond (CoT): 48.0% โ 50.5% (+2.5%)
> MATH (CoT): 68.0% โ 77.0% (+9%)
Llama 3.3 70B vs 405B:
> GPQA Diamond (CoT): 50.5% vs 49.0%
> Math (CoT): 77.0% vs 73.8%
> Steerability (IFEval): 92.1% vs 88.6%
huggingface.co/meta-llama/L...
BOOOOM! Meta released Llama 3.3 70B - 128K context, multilingual, enhanced tool calling, outperforms Llama 3.1 70B and comparable to Llama 405B ๐ฅ
Comparable performance to 405B with 6x LESSER parameters โก
And.. here's a space to try out the model too:
huggingface.co/spaces/ai4bh...
Check out the model checkpoints here:
huggingface.co/ai4bharat/in...
Introducing Indic-Parler TTS - Trained on 10K hours of data, 938M params, supports 20 Indic languages, emotional synthesis, apache 2.0 licensed! ๐ฅ
w/ fully customisable speech and voice personas!
Try it out directly below or use the model weights as you want!
๐ฎ๐ณ/acc
try it out today on hf.co/datasets - just click on `SQL Console` followed by `AI Query` ๐ฏ
you can just do things - ask AI to create your SQL queries and execute them right in your browser! ๐ฅ
let your creativity guide you - powered by qwen 2.5 coder 32b โก
available on all 254,746 public datasets on the hub!
go check it out today! ๐ค
This demo of structured data extraction running on an LLM that executes entirely in the browser (Chrome only for the moment since it uses WebGPU) is amazing
My notes here: simonwillison.net/2024/Nov/29/...
Here's the GitHub repo in case you fancy it:
github.com/Vaibhavs10/g...
To showcase how much you can do with just a 1.7B LLM, you pass free text, define a schema of parsing the text into a GitHub issue (title, description, categories, tags, etc) - Let MLC & XGrammar do the rest!
That's it, the code is super readable, try it out today! ๐ค
huggingface.co/spaces/reach...
Fuck it! Structured Generation w/ SmolLM2 running in browser & WebGPU ๐ฅ
Powered by MLC Web-LLM & XGrammar โก
Define a JSON schema, Input free text, get structured data right in your browser - profit!!
FYI, here's the entire code to create a dataset of every single bsky message in real time:
```
from atproto import *
def f(m): print(m.header, parse_subscribe_repos_message())
FirehoseSubscribeReposClient().start(f)
```
I have converted a portion of my NLP Online Masters course to blog form. This is the progression I present that takes one from recurrent neural network to seq2seq with attention to transformer. mark-riedl.medium.com/transformers...
I'm disheartened by how toxic and violent some responses were here.
There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.
> uses 90% sliding window and 10% global attention for efficiency
> 2-stage pre-training and 3-phase post-training, including a trapezoid learning rate schedule
try it out on hugging face today! ๐ค
huggingface.co/collections/...
yo! nvidia finally released the weights for Hymba-1.5B - outperforms Qwen, and SmolLM2 w/ 6-12x less training
trained ONLY on 1.5T tokens
> massive reductions in KV cache size and improved throughput
> combines Mamba and Attention in a hybrid parallel architecture with a 5:1 ratio and meta-tokens
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
You can run inference via llama.cpp too:
huggingface.co/OuteAI/OuteT...
Model weights on the hub, you can even run this on a Raspberry Pi! Go run, inference now! ๐
huggingface.co/OuteAI/OuteT...
Smol TTS keeps getting better! Introducing OuteTTS v0.2 - 500M parameters, multilingual with voice cloning! ๐ฅ
> Multilingual - English, Chinese, Korean & Japanese
> Cross platform inference w/ llama.cpp
> Trained on 5 Billion audio tokens
> Qwen 2.5 0.5B LLM backbone
> Trained via HF GPU grants
๐ฏ
๐
It depends on what you define long context; I'm fairly confident up until 64K and moderately till 128K, beyond that - I've personally never tested.
Most of my observations are based on chat use-cases.
Yeah! @loubnabnl.hf.co & @eliebak.bsky.social are ๐