#VLMs

@young-nurie.bsky.social

1 month ago

MediaSage - AI-Powered Creative Platform 🚀 Transform your ideas into stunning visuals! Generate professional marketing campaigns, talking avatar videos, AI images, and more. Try Sage Campaigns, AI Shorts, Wizard editing - all in one powerful...

Many people want try visual models for study, hobby, or personal projects.

We added a matched subscription tier for you!
Just launched 'Plus Tier' in MediaSage with huge discounts.

Ready to grant Plus tier for active contributors in our forum. 😊

media.nurie.ai

#MediaSage #VLMs #AiImage #AiVideo

1 0 1 0

Young | NURIE AI

@young-nurie.bsky.social

1 month ago

It's not easy to describe what we see.
So you can also try the Picture-to-Prompt.

My colleague draw this in the wall for Christmas, I wonder if I can reuse it somehow.
Check my result.

You can also try it here - media.nurie.ai

#Image2Prompt #Picture2Prompt #VLMs #MediaSage #NURIEAI #NURIE

0 0 0 0

VLM Run

@vlmrun.bsky.social

2 months ago

🎄 Merry Christmas from @vlmrun.bsky.social!

Grateful to our customers and partners trusting us with the most demanding visual workloads: documents, images, and video at scale.

Here’s to a bigger year turning pixels into production systems.

#genai #multimodal #vlms #infrastructure

1 0 1 0

@emergingtechnews.bsky.social

2 months ago

R²D²: Improving Robot Manipulation with Simulation and Language Models | NVIDIA Technical Blog Robot manipulation systems struggle with changing objects, lighting, and contact dynamics when they move into dynamic real-world environments. On top of this, gaps between simulation and reality…

#NVIDIA Robotics Research and Development Digest (#R²D²) explores novel approaches to improving #robot @manipulation skills via research efforts that use reasoning #LLMs, sim-and-real co-training & #VLMs for designing tools.
developer.nvidia.com/blog/r2d2-im...

0 0 0 0

Transportation and Vehicle Journals - COMMTR & JICV

@commtr-jicv.bsky.social

3 months ago

🚗 Call for Papers — #COMMTR Special Issue
"Foundation Models for Intelligent Control in Autonomous Driving Traffic Systems"

COMMTR welcomes submissions exploring how #LLMs, #VLMs, and #multimodalfoundationmodels are advancing autonomous driving and intelligent traffic systems. 🤖

1 0 1 0

Marta Koch

@martakoch.bsky.social

3 months ago

* @imperialcollegeldn.bsky.social @ic-cep.bsky.social @imperialsci.bsky.social @ucl.ac.uk

#IDSSD #AI #NLP #LLMs #VLMs #AIforSustainability #AIforNature #AIforClimate #SDG13 #SDGs #GlobalGoals #2030Agenda #AIforGood #AI4SDGs #Science4Policy #ResponsibleAI #SustainableAI #DigitalPublicInfrastructure

2 0 0 0

Kempner Institute at Harvard University

@kempnerinstitute.bsky.social

4 months ago

Into the Rabbit Hull – Part I - Kempner Institute This blog post offers an interpretability deep dive, examining the most important concepts emerging in one of today’s central vision foundation models, DINOv2. This blogpost is the first of a […]

🐇Into the Rabbit Hull — Part 1: A Deep Dive into DINOv2🧠
Our latest Deeper Learning blog post is an #interpretability deep dive into one of today’s leading vision foundation models: DINOv2.
📖Read now: bit.ly/4nNfq8D
Stay tuned — Part 2 coming soon.
#AI #VLMs #DINOv2

11 2 1 0

TMLR Published Papers

@tmlr-pub.bsky.social

4 months ago

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

Kim-Celine Kahl, Selen Erkan, Jeremias Traub et al.

Action editor: Weijian Deng

https://openreview.net/forum?id=qjNdGpgpV8

#vlm #vlms #visual

0 0 0 0

RES Publications

@respublications.bsky.social

5 months ago

New #OpenAccess research in #RESSystematicEnt

Descriptron: Artificial intelligence for automating taxonomic species descriptions with a user-friendly software package
doi.org/10.1111/syen.70005

#AnalyticalAI #AI #VLMs #ViTs #LLMs #Taxonomy
@gkergoat.bsky.social @wiley.com

0 0 0 0

FeM - Funghi e Micologia

@funghimicologia.bsky.social

5 months ago

I2C-UHU-PEGASUS at FungiCLEF 2025: Multimodal Pipeline for Rare Fungal Species Classification Using Fine-Tuned VLMs and Ecological Context

#FungiCLEF
#FungalSpecies
#VLMs
#Few-shotlearning
#Fungalclassification
#Rarespecies
#MultimodalAI
#DeepLearning
#TransferLearning

ceur-ws.org/Vol-4038/pap...

1 0 0 0

FBK - NLP Research Group

@fbk-nlp.bsky.social

5 months ago

🚀 Big news from #CLiCit2025!
Our PhD student Davide Testa presented MAIA 🎞️🧠🇮🇹 — the first Italian benchmark to test Vision-Language Models on multimodal reasoning & robustness.
📄 You can check the Paper in the CLiC-it pre-processing: clic2025.unica.it/wp-content/u...

#AI #NLProc #VLMs

2 0 0 0

En la mente de la máquina

@elmdlm.bsky.social

6 months ago

FineVision: ¡El Dataset abierto GIGANTE que impulsa una nueva Era para los VLMs! YouTube video by En la mente de la máquina, Inteligencia Artificial

Presentamos FineVision, el dataset más grande para entrenar VLMs. ¡Un recurso increíble con 24M de muestras que democratiza el acceso a datos de alta calidad!

youtu.be/VxXd48jOdJs

#IA #LLMs #FineVision #HuggingFace #MachineLearning #VLMs

1 0 0 0

Woojin Kim, MD

@woojinkim.com

7 months ago

PET2Rep: Towards Vision-Language Model-Drived Automated Radiology Report Generation for Positron Emission Tomography Positron emission tomography (PET) is a cornerstone of modern oncologic and neurologic imaging, distinguished by its unique ability to illuminate dynamic metabolic processes that transcend the…

"We conduct a head-to-head comparison of 30 cutting-edge general-purpose and medical-specialized VLMs. The results show that the current state-of-the-art #VLMs perform poorly on PET report generation task, falling considerably short of fulfilling practical needs."

arxiv.org/abs/2508.040...

0 0 0 0

NAVER LABS Europe

@naverlabseurope.bsky.social

8 months ago

We're thrilled to have Ahmet Iscen of Google DeepMind with us tomorrow to talk about his work on #VLMs. Join us online! 'VLM-driven data & context curation for visual understanding'
🗓️Tuesday July 8th: 11am CEST
Registration & more info: tinyurl.com/yz3rvz3z

9 1 0 0

Alexandre Lacoste

@alex-lacoste.bsky.social

8 months ago

🙌 Huge thanks to the team:
Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan

Follow for updates!
#ICCV2025 #VLMs #AI4EO #RemoteSensing #GeospatialAI #MachineLearning #Benchmarking

1 0 0 0

Robert Haase

@haesleinhuepf.bsky.social

8 months ago

It was an honor to give a keynote lecture about #VLMs for Bio-Image #DataScience at the @helmholtzimaging.bsky.social #HIconference2025.
You find my slides #openaccess 🔬🖥️🚀
doi.org/10.5281/zeno...

14 5 1 0

Hacker News Companion

@hncompanion.com

8 months ago

The conversation linked Text-to-LoRA to similar adaptation efforts in other domains, like Vision Language Models (VLMs). This highlights a broader trend in making large models more flexible. #VLMs 5/6

0 0 1 0

Woojin Kim, MD

@woojinkim.com

9 months ago

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering Vision-Language Models (VLMs) have shown promise in various 2D visual tasks, yet their readiness for 3D clinical diagnosis remains unclear due to stringent demands for recognition precision,…

🤔 "Overall, while modern #VLMs demonstrate promise in basic and recognition-heavy tasks, their applicability to real-world diagnostics is currently limited by weak visual signal, unreliable numeracy, and shallow reasoning chains."

www.arxiv.org/abs/2505.18915

0 0 0 0

Kempner Institute at Harvard University

@kempnerinstitute.bsky.social

10 months ago

Interpreting the Linear Structure of Vision-Language Model Embedding Spaces - Kempner Institute Using sparse autoencoders, the authors show that vision-language embeddings boil down to a small, stable dictionary of single-modality concepts that snap together into cross-modal bridges. This resear...

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text.

bit.ly/KempnerVLM

by @isabelpapad.bsky.social ,Chloe Huangyuan Su, @thomasfel.bsky.social, Stephanie Gil, and @shamkakade.bsky.social

#AI #ML #VLMs #SAEs

9 3 0 0

Aarash Feizi

@aarashfeizi.bsky.social

1 year ago

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare data pairs as instructed in the prompt becomes essential. To ad...

🧵 7/7

📢 Shoutout to my amazing co-authors and to ServiceNow Research and Mila for making this happen! 🚀

📄 Read the full paper: arxiv.org/abs/2502.15210

#PairBench #LLMs #VLMs #GenAI #AutoEval

1 0 0 0

Aldozzzzz

@az-mtl.bsky.social

1 year ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features We introduce SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP. In this second iteration, we extend the original image-text training obje...

SigLIP 2 just came out. 👀 #vlms #clip arxiv.org/abs/2502.14786

2 0 0 0

Jan Clusmann

@janclusmann.bsky.social

1 year ago

So happy to share this: secret instructions hidden in images can terribly alter the output of VLMs to create misdiagnosis. Models have different susceptibility, though, with Claude3.5 apparently being much better aligned to ethical outputs than GPT4o.

#aisafety
#LLMs
#VLMs

4 0 0 0

Ross Greer

@rossgreer.bsky.social

1 year ago

Dr. Stewart Worrall (The University of Sydney)
Dr. Ignacio Alvarez (Intel Labs)
Maria Lyssenko (BOSCH)
Andra Petrovai (TU Cluj-Napoca)

#IV2025 #IEEE #ITS #FoundationModels #AutonomousDriving
#Perception #IntelligentVehicles
#LLMs #VLMs
#IntelligentTransportationSystems #3DVision

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 year ago

Awakari App

Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...

graphcore-research.github.io/posts/llama-vision/

#posts #transformers #LLMs #VLMs

Event Attributes

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 year ago

Awakari App

Llama 3.2 Vision — A Deep Dive Vision-Language Models (VLMs) allow LLMs to “see”, but how d...

graphcore-research.github.io/posts/llama-vision/

#posts #transformers #LLMs #VLMs

Event Attributes

0 0 0 0

Aldozzzzz

@az-mtl.bsky.social

1 year ago

Anyone is aware of the latest 🍿 on patch/pixel-level image-text alignment? Ideally something that does not use/need segmentation masks. #VLMs #multimodal

0 0 0 0

lea

@leeps.bsky.social

1 year ago

sdxl was much more open to non-standard inpainting edits than flux.

I need a more scientific approach to this one to be sure.

#buildinpublic #vlms #stablediffusion #flux #sdxl

0 0 1 0

Vincenzo Bifano

@enbifa.bsky.social

1 year ago

GitHub - balrog-ai/BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Benchmarking Agentic LLM and VLM Reasoning On Games - balrog-ai/BALROG

🎮 BALROG: Benchmarking AI with Games

📊 BALROG tests #LLMs & #VLMs on 6 games:
🧭 BabyAI (navigation)
🛠️ Crafter (crafting/survival)
📜 TextWorld (puzzles)
🎲 Baba Is #AI (rule manipulation)

🔍 #Llama3 70B outperforms #GPT4 in tasks like Baba Is AI. Open models excel in text over visuals.

#Gaming

2 0 0 0

lea

@leeps.bsky.social

1 year ago

Discovered GeoDE at today's Sundai Club—a Princeton dataset showing how 'stove' or 'house' are perceived differently worldwide. Exciting potential for culturally-aware ML projects with foundation models ( #CLIP, #VLMs). Check it out: geodiverse-data-collection.cs.princeton.edu #MachineLearning #AI

1 0 1 0

Aldozzzzz

@az-mtl.bsky.social

1 year ago

Most people who have played with CLIP know how bad it can be outside of academic benchmarks 😅. That’s also why a lot of text2img eval. papers for generative models are trying to replace CLIPScore. See this very nice work on the subject: arxiv.org/abs/2404.01291. #VLMs #GenAI

4 1 0 0

Posts tagged #VLMs