Definitely worth checking out, the authors have done very well here! I'm especially interested in seeing more 'context' models, they're very novel.
Definitely worth checking out, the authors have done very well here! I'm especially interested in seeing more 'context' models, they're very novel.
The models are all MIT Licensed, i.e. commercially viable, and supported with Sentence Transformers, Text Embedding Inference, Transformers.js, etc.
๐งต
The models have been evaluated on various benchmarks like MMTEB, MTEB(Code), MIRACL, BERGEN, ToolRet and ConTEB (for the context model), where they perform very well for their sizes.
๐งต
They then turned this strategy into 4 models:
- 2 sizes: 0.6B and 4B parameters
- 2 types:
pplx-embed-v1 for dense embeddings,
pplx-embed-context-v1 for al dense embeddings that are computed with entire documents all at once: each chunk contains global document information!
๐งต
They first performed diffusion-style pretraining on Qwen3 to turn it into a bidirectional model. This allows every token to attend to every other token, even 'future' tokens further in the same text. Causal models (like most decoders) can only look at previous tokens.
๐งต
The models & paper: huggingface.co/collections/...
๐งต
๐ค Perplexity has released 4 open-weights state-of-the-art multilingual embedding models designed for retrieval tasks!
pplx-embed-v1 and pplx-embed-context-v1
Specifically trained for int8 and binary embeddings, they'll be viable for massive search problems.
Details in ๐งต
I've collaborated quite closely with the PyLate authors over the last months, as PyLate relies heavily on Sentence Transformers. This is very strong work, definitely worth checking out!
Kudos to @nohtow.bsky.social, Luca Arnaboldi, @amelietabatta.bsky.social and @krzakalaf.bsky.social.
All models, including intermediate checkpoints for every training phase and configuration, are released under Apache 2.0. The strongest model, lightonai/ColBERT-Zero, is the new strongest late interaction model.
๐งต
Luckily, skipping the expensive unsupervised phase and simply adding a supervised contrastive step before distillation reaches 55.12 nDCG@10, which is 99.4% of ColBERT-Zero's performance at roughly 10x lower compute cost (40 vs 408 GH200-hours).
๐งต
By running all contrastive pre-training phases directly in the multi-vector setting, via PyLate, LightOn could outperform the standard approach.
๐งต
The key insight behind ColBERT-Zero is that the standard recipe for training ColBERT models, taking a strong dense model and bolting on a small knowledge distillation step, leaves a lot of performance on the table.
๐งต
Check out the models and paper here: huggingface.co/collections/...
๐งต
Give the detailed blogpost a read: huggingface.co/blog/lighton...
๐งต
๐ LightOn is back with a SOTA late-interaction model for search: ColBERT-Zero!
By performing contrastive pre-training directly in the multi-vector setting, it outperforms GTE-ModernColBERT etc. on BEIR, using only public data and reaching 55.43 nDCG@10.
Details in ๐งต
ggml / llama.cpp are joining @hf.co, ensuring it'll stay open, maintained, and up to date for a long long time! ๐
huggingface.co/blog/ggml-jo...
Great work by the Jina team. The paper is also extremely interesting, using a lot of different losses and providing valuable ablations. If you're into training embedding models, definitely give it a read.
huggingface.co/papers/2602....
The only downside is that the models are licensed under cc-by-nc-4.0. You'll have to contact Jina if you'd like to use these for commercial use.
๐งต
The models each run with Sentence Transformers, Transformers, Jina's API, Text Embedding Inference, vLLM, Llama.cpp, and MLX. Super useful!
๐งต
Beyond the two models with multiple task adapters, you can also directly load the model with one of the adapters applied, e.g. 'jinaai/jina-embeddings-v5-text-small-retrieval'.
This is especially nice if you want to avoid 'trust_remote_code'.
๐งต
The models are also competitive on English only, performing very well for their sizes. You love to see it.
๐งต
Multilingual Retrieval performance:
jina-v5-text-small outperforms Qwen3-Embedding-0.6B for effectively the same model size, and reaches much higher scores than any other model at <1B parameters.
jina-v5-text-nano also outperforms everything up to twice its parameter size.
๐งต
Both models were trained and evaluated on numerous languages, and so they're strong new multilingual options.
They're also trained using a clever adapter-switching system. You can select either retrieval, text-matching, classification, or clustering, depending on your task.
๐งต
jina-embeddings-v5-text-nano:
- 239M parameters, 8k sequence length, 768 dimensionality
- The embeddings can be truncated to 32, 64, 128, 256, 512, 768 via its Matryoshka support
- Base model is EuroBERT/EuroBERT-210m
๐งต
jina-embeddings-v5-text-small:
- 677M parameters, 32k sequence length, 1024 dimensionality
- The embeddings can be truncated to 32, 64, 128, 256, 512, 768, 1024 via its Matryoshka support
- Base model is Qwen/Qwen3-0.6B-Base
๐งต
Check out the models here: huggingface.co/collections/...
๐งต
๐ Jina AI is back with new state-of-the-art multilingual embedding models for retrieval & more:
jina-embedding-v5-text!
2 efficient sizes, 239M & 677M, they outperform Qwen3-embedding, EmbeddingGemma-300m, multilingual-e5-large, etc.
Details in ๐งต
More embedding models and an even more reliable inference engine is what you get with @hf.co Text Embeddings Inference v1.9.0 ๐ฅ
More in the thread ๐งต
More details in the release notes: github.com/huggingface/...
Transformers v5.2 updated some behind the scenes methods for its Trainer that Sentence Transformers relies on for logging metrics.
So, if you update to Transformers v5.2 with an older Sentence Transformers version, you'll encounter crashes when a metric is logged.
๐งต