Trending

#MultimodalLLM

Latest posts tagged with #MultimodalLLM on Bluesky

Latest Top
Trending

Posts tagged #MultimodalLLM

Post image

Z.AI just dropped GLM‑4.7, an open‑source LLM that cranks up coding, reasoning, and text‑vision with massive context windows and a sleek API. Looks like a serious Claude challenger. Dive in for the details! #GLM47 #OpenSourceAI #MultimodalLLM

🔗 aidailypost.com/news/zai-rel...

0 0 0 0
Multimodal LLMs Learn to Ask Clarifying Questions for Household Robots

Multimodal LLMs Learn to Ask Clarifying Questions for Household Robots

Researchers fine‑tuned a multimodal LLM so household robots can ask clarification questions, boosting task success by 10.4‑16.5% over baselines. Posted 1 Apr 2025. Read more: getnews.me/multimodal-llms-learn-to... #multimodalllm #householdrobots

0 0 0 0
Fine‑tuning Multimodal LLMs for Embodied Agents that Ask Questions

Fine‑tuning Multimodal LLMs for Embodied Agents that Ask Questions

RL‑fine‑tuned multimodal LLMs improve Ask-to-Act benchmark success by 10‑16% over baselines, learning to ask minimal clarification questions without human‑provided rewards. Read more: getnews.me/fine-tuning-multimodal-l... #multimodalllm #embodiedai

0 0 0 0
GHOST: Images that Trigger Hallucinations in Multimodal LLMs

GHOST: Images that Trigger Hallucinations in Multimodal LLMs

GHOST generates images that cause multimodal LLMs to hallucinate missing objects, achieving a success rate over 28% versus ~1% for prior methods. The cues also fooled GPT‑4o at 66.5%. Read more: getnews.me/ghost-images-that-trigge... #multimodalllm #ghost

0 0 0 0
Identifying Vision Function Layers in Multimodal Large Language Models

Identifying Vision Function Layers in Multimodal Large Language Models

Vision Function Layers (VFLs) in multimodal LLMs concentrate visual tasks into a few decoder layers; VFL‑select keeps 98 % performance using only ~20 % of the data. Read more: getnews.me/identifying-vision-funct... #vfl #multimodalllm

0 0 0 0
Survey Highlights Advances in Multimodal Large Language Models for Emotion Recognition

Survey Highlights Advances in Multimodal Large Language Models for Emotion Recognition

A 35‑page survey shows multimodal LLMs beat text‑only baselines on emotion classification; instruction‑tuned models achieve high generalization. Read more: getnews.me/survey-highlights-advanc... #multimodalllm #emotionrecognition

0 0 0 0
Multimodal LLMs Enable Preference‑Based Long‑Horizon Robotic Stacking

Multimodal LLMs Enable Preference‑Based Long‑Horizon Robotic Stacking

A new study fine‑tuned a multimodal LLM with a dataset on weight, stability, size and footprint, letting a humanoid robot plan stacking tasks and beat a baseline model in simulations. getnews.me/multimodal-llms-enable-p... #multimodalllm #robotics

0 0 0 0
Multimodal LLMs Reveal Redundancy in Multiple Vision Encoders

Multimodal LLMs Reveal Redundancy in Multiple Vision Encoders

Removing certain vision encoders can boost accuracy by up to 3.6%, while using just one or two encoders retains over 90% of baseline performance on most non‑OCR tasks. Read more: getnews.me/multimodal-llms-reveal-r... #multimodalllm #visionencoder #ai

0 0 0 0
Multimodal LLMs Empower Household Robots to Ask Clarifying Questions

Multimodal LLMs Empower Household Robots to Ask Clarifying Questions

Study shows household robots can ask clarification questions using LLMs fine‑tuned with reinforcement learning, significantly boosting performance by 10.4%–16.5%. Read more: getnews.me/multimodal-llms-empower-... #multimodalllm #householdrobots

0 0 0 0
Benchmark Tests MLLM Web Understanding: Reasoning, Robustness, Safety

Benchmark Tests MLLM Web Understanding: Reasoning, Robustness, Safety

WebRSSBench, a benchmark for multimodal LLMs, defines eight web tasks and evaluated twelve models, exposing gaps in compositional reasoning and reduced robustness to layout changes. getnews.me/benchmark-tests-mllm-web... #webrssbench #multimodalllm

0 0 0 0
Systematic Study Finds Text and Image Leakage in Multimodal LLMs

Systematic Study Finds Text and Image Leakage in Multimodal LLMs

MM‑Detect reveals text and image leakage in multimodal LLMs, detecting contamination in 12 models across five benchmarks; paper updates run through 20 Sep 2025. Read more: getnews.me/systematic-study-finds-t... #mmdetect #multimodalllm

0 0 0 0
Interpretable Audio Editing Evaluation with Chain‑of‑Thought LLMs

Interpretable Audio Editing Evaluation with Chain‑of‑Thought LLMs

A new multimodal LLM framework uses Chain‑of‑Thought prompting to evaluate edited audio, giving text explanations that align with human MOS ratings. Code is on GitHub. getnews.me/interpretable-audio-edit... #audioediting #multimodalllm

0 0 0 0
Control-Theoretic Framework Improves Multimodal LLM Efficiency

Control-Theoretic Framework Improves Multimodal LLM Efficiency

The MCP framework boosts multimodal LLM accuracy by 15‑30% and cuts compute time by ~40%, while its Presenter layer reaches 90% of human‑rated interpretability. Read more: getnews.me/control-theoretic-framew... #multimodalllm #controltheory #efficiency

0 0 0 0
Language‑Instructed Reasoning Improves Group Activity Detection

Language‑Instructed Reasoning Improves Group Activity Detection

LIR‑GAD adds <ACT> and <GROUP> tokens to a multimodal LLM, boosting accuracy and interpretability for group activity detection on standard benchmarks. Read more: getnews.me/language-instructed-reas... #groupactivitydetection #multimodalllm

0 0 0 0
Examining How Humans and Multimodal LLMs Judge Generated Images

Examining How Humans and Multimodal LLMs Judge Generated Images

The 16 September 2025 study finds multimodal LLMs detect artifacts and style but often miss anatomical accuracy, unlike humans who reliably judge all six quality attributes. Read more: getnews.me/examining-how-humans-and... #multimodalllm #imageevaluation

0 0 0 0
ANTS Method Uses Multimodal LLMs to Boost OOD Detection

ANTS Method Uses Multimodal LLMs to Boost OOD Detection

ANTS, a training‑free, zero‑shot method using multimodal LLMs, cut the false‑positive rate by 4.2% at 95% recall on ImageNet. The paper was released September 11 2025. Read more: getnews.me/ants-method-uses-multimo... #ants #ooddetection #multimodalllm

1 0 0 0
Alibaba AI Team Unveils Ovis 2.5 Multimodal LLMs: A Breakthrough in Open Source AI
https://softtechhub.us/2025/08/19/alibaba-ai-team-unveils-ovis-2-5/

#AlibabaAI #Ovis25 #MultimodalLLM #OpenSourceAI #TechBreakthrough #ArtificialIntelligence #MachineLearning #AIInnovation #NaturalLanguageProcessing #AIResearch

Alibaba AI Team Unveils Ovis 2.5 Multimodal LLMs: A Breakthrough in Open Source AI https://softtechhub.us/2025/08/19/alibaba-ai-team-unveils-ovis-2-5/ #AlibabaAI #Ovis25 #MultimodalLLM #OpenSourceAI #TechBreakthrough #ArtificialIntelligence #MachineLearning #AIInnovation #NaturalLanguageProcessing #AIResearch

Alibaba AI Team Unveils Ovis 2.5 Multimodal LLMs: A Breakthrough in Open Source AI
softtechhub.us/2025/08/19/a...

#AlibabaAI #Ovis25 #MultimodalLLM #OpenSourceAI #TechBreakthrough #ArtificialIntelligence #MachineLearning #AIInnovation #NaturalLanguageProcessing #AIResearch

0 1 0 0
NVIDIA AI Introduces Describe Anything 3B: A new multimodal LLM focused on detailed image and video descriptions
https://softtechhub.us/2025/04/26/nvidia-ai-describe-anything-3b/

#NVIDIAAI #DescribeAnything #MultimodalLLM #AI #ImageDescription  #VideoAnalysis #TechInnovation #MachineLearning #ArtificialIntelligence  #DeepLearning

NVIDIA AI Introduces Describe Anything 3B: A new multimodal LLM focused on detailed image and video descriptions https://softtechhub.us/2025/04/26/nvidia-ai-describe-anything-3b/ #NVIDIAAI #DescribeAnything #MultimodalLLM #AI #ImageDescription #VideoAnalysis #TechInnovation #MachineLearning #ArtificialIntelligence #DeepLearning

NVIDIA AI Introduces Describe Anything 3B: A new multimodal LLM focused on detailed image and video descriptions
softtechhub.us/2025/04/26/n...

#NVIDIAAI #DescribeAnything #MultimodalLLM #AI #ImageDescription #VideoAnalysis #TechInnovation #MachineLearning #ArtificialIntelligence #DeepLearning

2 1 0 0