πRecipe: huggingface.co/learn/cookbo...
πOriginal blog by Edward Beeching, @lewtun.bsky.social and @srushnlp.bsky.social from @hf.co: huggingface.co/spaces/Huggi...
Thanks @stevhliu.hf.co and @lewtun.bsky.social for the feedback π
πRecipe: huggingface.co/learn/cookbo...
πOriginal blog by Edward Beeching, @lewtun.bsky.social and @srushnlp.bsky.social from @hf.co: huggingface.co/spaces/Huggi...
Thanks @stevhliu.hf.co and @lewtun.bsky.social for the feedback π
Scaling test-time compute with open models diagram
π§ Following Hugging Face's blog on scaling test-time compute with open modelsβletting models "think longer," inspired by OpenAI & DeepMindβI created a recipe to extend inference time for Instruct LLMs, tackling harder tasks like complex math problems.
Links below π
Hereβs whatβs included:
π· SmolVLM (VLM) by @hf.co
π§ SFT & DPO fine-tuning methods
βοΈ Runs on consumer GPUs
πSFT project: huggingface.co/learn/cookbo...
πDPO project: huggingface.co/learn/cookbo...
π @stevhliu.hf.co & @merve.bsky.social & @benburtenshaw.bsky.social
Iβm a big fan of smol modelsβcompact, efficient, and perfect for inference/training on limited resources. Even better when theyβre multimodal! π€β¨
I explored fine-tuning SmolVLM, a multimodal smol model using TRL with SFT and DPO, creating 2 hands-on projects!
πLinks belowπ
π‘I've been exploring how to go smol with multimodal RAG.
I've created a project using SmolVLM and ColSmolVLM to create a multimodal RAG that can run on Colab's free tier.
Featuring:
π€π SmolVLM (VLM)
π€πColQwen2 (Doc Retrieval)
βοΈ Runs in Colab's free-tier GPU
Link below
π Recipe in @hf.co: huggingface.co/learn/cookbo...
π @stevhliu.hf.co & @merve.bsky.social
π‘ New Multimodal RAG Recipe with Re-Ranking π‘
I explored how to enhance a multimodal RAG pipeline by integrating a re-ranker!
Featuring:
β¨ Qwen2-VL-7B (VLM)
π ColQwen2 (Doc Retrieval)
π MonoQwen2 (Re-ranking)
π₯ Optimized for consumer GPUs with quantized VLMs.
Link below:
screenshot of the notebook in the link
Learn how to build a complete multimodal RAG pipeline with
ColQwen2 as retriever, MonoQwen2-VL as reranker, Qwen2-VL as VLM in this notebook that runs on a GPU as small as L4 π₯ huggingface.co/learn/cookbo...
β¨ Gave a talk on autonomous driving today to undergrad students! We covered everything from definitions to real-world examples, plus cutting-edge concepts like Generative World Models and Vision-Language Models (VLMs). Exciting future ahead! ππ‘
This is such a cool project, and it was a truly exciting experience to contribute to it!! π
We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!
bsky.app/profile/benb...
>> huggingface.co/docs/trl/mai...
Thanks to @arig23498.bsky.social, @pcuenq.hf.co, and @reach-vb.hf.co for the collaboration. It's a pleasure working with such talented individuals! π
1οΈβ£ Tool calling: github.com/huggingface/...
2οΈβ£ TGI: github.com/huggingface/...
I've been exploring the latest Llama 3.2 releases and working on a couple of projects you may find interesting:
1οΈβ£ Understanding tool calling with Llama 3.2 π§
2οΈβ£ Using Text Generation Inference (TGI) with Llama models π¦
(links in the next post)
π Link to the blog post: weaviate.io/blog/what-is... (by Erika Cardenas, @iamleonie.bsky.social)
π Link to the recipe: huggingface.co/learn/cookbo...
π€ Huge thanks to Aymeric Roucher and @stevhliu.hf.co for their support and insights!
In this notebook, I use Qwen2.5-72B-Instruct as the LLM to build a system with:
1οΈβ£ A manager agent
2οΈβ£ Three specialized agents: retriever, web search, and image generation
π§βπ³ The result is this new Hugging Face Cookbook recipe, where I demonstrate how to create a Multi-Agent RAG system leveraging the agent support from the transformers module.
π‘ A few days ago, I came across a fascinating post about Agentic RAG by Erika Cardenas and Leonie Monigatti, and it inspired me to dive into the concept and bring it to life in code!
4/6 More vision skills for complex visual tasks. This tutorial shows how to fine-tune the Qwen2-VL-7B model for visual question answering using the ChartQA dataset.
huggingface.co/learn/cookbo...
by @sergiopaniego.bsky.social
TRL is a cornerstone of LLM post training and imo it's the default to learn.
There are great alternatives like Unsloth, Axolotl, and AutoTrain. But if you want a daily drive that does experimentation to production, it's TRL.
π§΅ these community notebooks guide you through TRL's core:
hola π hi π