Trending

#VLMs

Latest posts tagged with #VLMs on Bluesky

Latest Top
Trending

Posts tagged #VLMs

Preview
MediaSage - AI-Powered Creative Platform 🚀 Transform your ideas into stunning visuals! Generate professional marketing campaigns, talking avatar videos, AI images, and more. Try Sage Campaigns, AI Shorts, Wizard editing - all in one powerful...

Many people want try visual models for study, hobby, or personal projects.

We added a matched subscription tier for you!
Just launched 'Plus Tier' in MediaSage with huge discounts.

Ready to grant Plus tier for active contributors in our forum. 😊

media.nurie.ai

#MediaSage #VLMs #AiImage #AiVideo

1 0 1 0
Video

It's not easy to describe what we see.
So you can also try the Picture-to-Prompt.

My colleague draw this in the wall for Christmas, I wonder if I can reuse it somehow.
Check my result.

You can also try it here - media.nurie.ai

#Image2Prompt #Picture2Prompt #VLMs #MediaSage #NURIEAI #NURIE

0 0 0 0
Post image

🎄 Merry Christmas from @vlmrun.bsky.social!

Grateful to our customers and partners trusting us with the most demanding visual workloads: documents, images, and video at scale.

Here’s to a bigger year turning pixels into production systems.

#genai #multimodal #vlms #infrastructure

1 0 1 0
Preview
R²D²: Improving Robot Manipulation with Simulation and Language Models | NVIDIA Technical Blog Robot manipulation systems struggle with changing objects, lighting, and contact dynamics when they move into dynamic real-world environments. On top of this, gaps between simulation and reality…

#NVIDIA Robotics Research and Development Digest (#R²D²) explores novel approaches to improving #robot @manipulation skills via research efforts that use reasoning #LLMs, sim-and-real co-training & #VLMs for designing tools.
developer.nvidia.com/blog/r2d2-im...

0 0 0 0
Post image

🚗 Call for Papers — #COMMTR Special Issue
"Foundation Models for Intelligent Control in Autonomous Driving Traffic Systems"

COMMTR welcomes submissions exploring how #LLMs, #VLMs, and #multimodalfoundationmodels are advancing autonomous driving and intelligent traffic systems. 🤖

1 0 1 0

* @imperialcollegeldn.bsky.social @ic-cep.bsky.social @imperialsci.bsky.social @ucl.ac.uk

#IDSSD #AI #NLP #LLMs #VLMs #AIforSustainability #AIforNature #AIforClimate #SDG13 #SDGs #GlobalGoals #2030Agenda #AIforGood #AI4SDGs #Science4Policy #ResponsibleAI #SustainableAI #DigitalPublicInfrastructure

2 0 0 0
Preview
Into the Rabbit Hull – Part I - Kempner Institute This blog post offers an interpretability deep dive, examining the most important concepts emerging in one of today’s central vision foundation models, DINOv2. This blogpost is the first of a […]

🐇Into the Rabbit Hull — Part 1: A Deep Dive into DINOv2🧠
Our latest Deeper Learning blog post is an #interpretability deep dive into one of today’s leading vision foundation models: DINOv2.
📖Read now: bit.ly/4nNfq8D
Stay tuned — Part 2 coming soon.
#AI #VLMs #DINOv2

11 2 1 0

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Kim-Celine Kahl, Selen Erkan, Jeremias Traub et al.

Action editor: Weijian Deng

https://openreview.net/forum?id=qjNdGpgpV8

#vlm #vlms #visual

0 0 0 0
Post image

New #OpenAccess research in #RESSystematicEnt

Descriptron: Artificial intelligence for automating taxonomic species descriptions with a user-friendly software package
doi.org/10.1111/syen.70005

#AnalyticalAI #AI #VLMs #ViTs #LLMs #Taxonomy
@gkergoat.bsky.social @wiley.com

0 0 0 0

I2C-UHU-PEGASUS at FungiCLEF 2025: Multimodal Pipeline for Rare Fungal Species Classification Using Fine-Tuned VLMs and Ecological Context

#FungiCLEF
#FungalSpecies
#VLMs
#Few-shotlearning
#Fungalclassification
#Rarespecies
#MultimodalAI
#DeepLearning
#TransferLearning

ceur-ws.org/Vol-4038/pap...

1 0 0 0
Post image

🚀 Big news from #CLiCit2025!
Our PhD student Davide Testa presented MAIA 🎞️🧠🇮🇹 — the first Italian benchmark to test Vision-Language Models on multimodal reasoning & robustness.
📄 You can check the Paper in the CLiC-it pre-processing: clic2025.unica.it/wp-content/u...

#AI #NLProc #VLMs

2 0 0 0
FineVision: ¡El Dataset abierto GIGANTE que impulsa una nueva Era para los VLMs!
FineVision: ¡El Dataset abierto GIGANTE que impulsa una nueva Era para los VLMs! YouTube video by En la mente de la máquina, Inteligencia Artificial

Presentamos FineVision, el dataset más grande para entrenar VLMs. ¡Un recurso increíble con 24M de muestras que democratiza el acceso a datos de alta calidad!

youtu.be/VxXd48jOdJs

#IA #LLMs #FineVision #HuggingFace #MachineLearning #VLMs

1 0 0 0
Preview
PET2Rep: Towards Vision-Language Model-Drived Automated Radiology Report Generation for Positron Emission Tomography Positron emission tomography (PET) is a cornerstone of modern oncologic and neurologic imaging, distinguished by its unique ability to illuminate dynamic metabolic processes that transcend the…

"We conduct a head-to-head comparison of 30 cutting-edge general-purpose and medical-specialized VLMs. The results show that the current state-of-the-art #VLMs perform poorly on PET report generation task, falling considerably short of fulfilling practical needs."

arxiv.org/abs/2508.040...

0 0 0 0
Post image

We're thrilled to have Ahmet Iscen of Google DeepMind with us tomorrow to talk about his work on #VLMs. Join us online! 'VLM-driven data & context curation for visual understanding'
🗓️Tuesday July 8th: 11am CEST
Registration & more info: tinyurl.com/yz3rvz3z

9 1 0 0

🙌 Huge thanks to the team:
Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan

Follow for updates!
#ICCV2025 #VLMs #AI4EO #RemoteSensing #GeospatialAI #MachineLearning #Benchmarking

1 0 0 0
Video

It was an honor to give a keynote lecture about #VLMs for Bio-Image #DataScience at the @helmholtzimaging.bsky.social #HIconference2025.
You find my slides #openaccess 🔬🖥️🚀
doi.org/10.5281/zeno...

14 5 1 0

The conversation linked Text-to-LoRA to similar adaptation efforts in other domains, like Vision Language Models (VLMs). This highlights a broader trend in making large models more flexible. #VLMs 5/6

0 0 1 0
Preview
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering Vision-Language Models (VLMs) have shown promise in various 2D visual tasks, yet their readiness for 3D clinical diagnosis remains unclear due to stringent demands for recognition precision,…

🤔 "Overall, while modern #VLMs demonstrate promise in basic and recognition-heavy tasks, their applicability to real-world diagnostics is currently limited by weak visual signal, unreliable numeracy, and shallow reasoning chains."

www.arxiv.org/abs/2505.18915

0 0 0 0
Preview
Interpreting the Linear Structure of Vision-Language Model Embedding Spaces - Kempner Institute Using sparse autoencoders, the authors show that vision-language embeddings boil down to a small, stable dictionary of single-modality concepts that snap together into cross-modal bridges. This resear...

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text.

bit.ly/KempnerVLM

by @isabelpapad.bsky.social ,Chloe Huangyuan Su, @thomasfel.bsky.social, Stephanie Gil, and @shamkakade.bsky.social

#AI #ML #VLMs #SAEs

9 3 0 0
Preview
PairBench: A Systematic Framework for Selecting Reliable Judge VLMs As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare data pairs as instructed in the prompt becomes essential. To ad...

🧵 7/7

📢 Shoutout to my amazing co-authors and to ServiceNow Research and Mila for making this happen! 🚀

📄 Read the full paper: arxiv.org/abs/2502.15210

#PairBench #LLMs #VLMs #GenAI #AutoEval

1 0 0 0
Preview
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training obje...

SigLIP 2 just came out. 👀 #vlms #clip arxiv.org/abs/2502.14786

2 0 0 0

So happy to share this: secret instructions hidden in images can terribly alter the output of VLMs to create misdiagnosis. Models have different susceptibility, though, with Claude3.5 apparently being much better aligned to ethical outputs than GPT4o.

#aisafety
#LLMs
#VLMs

4 0 0 0


Dr. Stewart Worrall (The University of Sydney)
Dr. Ignacio Alvarez (Intel Labs)
Maria Lyssenko (BOSCH)
Andra Petrovai (TU Cluj-Napoca)

#IV2025 #IEEE #ITS #FoundationModels #AutonomousDriving
#Perception #IntelligentVehicles
#LLMs #VLMs
#IntelligentTransportationSystems #3DVision

0 0 0 0
Awakari App

Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...

graphcore-research.github.io/posts/llama-vision/

#posts #transformers #LLMs #VLMs

Event Attributes

0 0 0 0
Awakari App

Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...

graphcore-research.github.io/posts/llama-vision/

#posts #transformers #LLMs #VLMs

Event Attributes

0 0 0 0

Anyone is aware of the latest 🍿 on patch/pixel-level image-text alignment? Ideally something that does not use/need segmentation masks. #VLMs #multimodal

0 0 0 0

sdxl was much more open to non-standard inpainting edits than flux.

I need a more scientific approach to this one to be sure.

#buildinpublic #vlms #stablediffusion #flux #sdxl

0 0 1 0
Preview
GitHub - balrog-ai/BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Benchmarking Agentic LLM and VLM Reasoning On Games - balrog-ai/BALROG

🎮 BALROG: Benchmarking AI with Games

📊 BALROG tests #LLMs & #VLMs on 6 games:
🧭 BabyAI (navigation)
🛠️ Crafter (crafting/survival)
📜 TextWorld (puzzles)
🎲 Baba Is #AI (rule manipulation)

🔍 #Llama3 70B outperforms #GPT4 in tasks like Baba Is AI. Open models excel in text over visuals.

#Gaming

2 0 0 0
Post image

Discovered GeoDE at today's Sundai Club—a Princeton dataset showing how 'stove' or 'house' are perceived differently worldwide. Exciting potential for culturally-aware ML projects with foundation models ( #CLIP, #VLMs). Check it out: geodiverse-data-collection.cs.princeton.edu #MachineLearning #AI

1 0 1 0

Most people who have played with CLIP know how bad it can be outside of academic benchmarks 😅. That’s also why a lot of text2img eval. papers for generative models are trying to replace CLIPScore. See this very nice work on the subject: arxiv.org/abs/2404.01291. #VLMs #GenAI

4 1 0 0