#MultimodalLLM

2 months ago

Z.AI just dropped GLM‑4.7, an open‑source LLM that cranks up coding, reasoning, and text‑vision with massive context windows and a sleek API. Looks like a serious Claude challenger. Dive in for the details! #GLM47 #OpenSourceAI #MultimodalLLM

🔗 aidailypost.com/news/zai-rel...

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Multimodal LLMs Learn to Ask Clarifying Questions for Household Robots

Researchers fine‑tuned a multimodal LLM so household robots can ask clarification questions, boosting task success by 10.4‑16.5% over baselines. Posted 1 Apr 2025. Read more: getnews.me/multimodal-llms-learn-to... #multimodalllm #householdrobots

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Fine‑tuning Multimodal LLMs for Embodied Agents that Ask Questions

RL‑fine‑tuned multimodal LLMs improve Ask-to-Act benchmark success by 10‑16% over baselines, learning to ask minimal clarification questions without human‑provided rewards. Read more: getnews.me/fine-tuning-multimodal-l... #multimodalllm #embodiedai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

GHOST: Images that Trigger Hallucinations in Multimodal LLMs

GHOST generates images that cause multimodal LLMs to hallucinate missing objects, achieving a success rate over 28% versus ~1% for prior methods. The cues also fooled GPT‑4o at 66.5%. Read more: getnews.me/ghost-images-that-trigge... #multimodalllm #ghost

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Identifying Vision Function Layers in Multimodal Large Language Models

Vision Function Layers (VFLs) in multimodal LLMs concentrate visual tasks into a few decoder layers; VFL‑select keeps 98 % performance using only ~20 % of the data. Read more: getnews.me/identifying-vision-funct... #vfl #multimodalllm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Survey Highlights Advances in Multimodal Large Language Models for Emotion Recognition

A 35‑page survey shows multimodal LLMs beat text‑only baselines on emotion classification; instruction‑tuned models achieve high generalization. Read more: getnews.me/survey-highlights-advanc... #multimodalllm #emotionrecognition

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Multimodal LLMs Enable Preference‑Based Long‑Horizon Robotic Stacking

A new study fine‑tuned a multimodal LLM with a dataset on weight, stability, size and footprint, letting a humanoid robot plan stacking tasks and beat a baseline model in simulations. getnews.me/multimodal-llms-enable-p... #multimodalllm #robotics

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Multimodal LLMs Reveal Redundancy in Multiple Vision Encoders

Removing certain vision encoders can boost accuracy by up to 3.6%, while using just one or two encoders retains over 90% of baseline performance on most non‑OCR tasks. Read more: getnews.me/multimodal-llms-reveal-r... #multimodalllm #visionencoder #ai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Multimodal LLMs Empower Household Robots to Ask Clarifying Questions

Study shows household robots can ask clarification questions using LLMs fine‑tuned with reinforcement learning, significantly boosting performance by 10.4%–16.5%. Read more: getnews.me/multimodal-llms-empower-... #multimodalllm #householdrobots

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Benchmark Tests MLLM Web Understanding: Reasoning, Robustness, Safety

WebRSSBench, a benchmark for multimodal LLMs, defines eight web tasks and evaluated twelve models, exposing gaps in compositional reasoning and reduced robustness to layout changes. getnews.me/benchmark-tests-mllm-web... #webrssbench #multimodalllm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Systematic Study Finds Text and Image Leakage in Multimodal LLMs

MM‑Detect reveals text and image leakage in multimodal LLMs, detecting contamination in 12 models across five benchmarks; paper updates run through 20 Sep 2025. Read more: getnews.me/systematic-study-finds-t... #mmdetect #multimodalllm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Interpretable Audio Editing Evaluation with Chain‑of‑Thought LLMs

A new multimodal LLM framework uses Chain‑of‑Thought prompting to evaluate edited audio, giving text explanations that align with human MOS ratings. Code is on GitHub. getnews.me/interpretable-audio-edit... #audioediting #multimodalllm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Control-Theoretic Framework Improves Multimodal LLM Efficiency

The MCP framework boosts multimodal LLM accuracy by 15‑30% and cuts compute time by ~40%, while its Presenter layer reaches 90% of human‑rated interpretability. Read more: getnews.me/control-theoretic-framew... #multimodalllm #controltheory #efficiency

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Language‑Instructed Reasoning Improves Group Activity Detection

LIR‑GAD adds <ACT> and <GROUP> tokens to a multimodal LLM, boosting accuracy and interpretability for group activity detection on standard benchmarks. Read more: getnews.me/language-instructed-reas... #groupactivitydetection #multimodalllm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Examining How Humans and Multimodal LLMs Judge Generated Images

The 16 September 2025 study finds multimodal LLMs detect artifacts and style but often miss anatomical accuracy, unlike humans who reliably judge all six quality attributes. Read more: getnews.me/examining-how-humans-and... #multimodalllm #imageevaluation

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

ANTS Method Uses Multimodal LLMs to Boost OOD Detection

ANTS, a training‑free, zero‑shot method using multimodal LLMs, cut the false‑positive rate by 4.2% at 95% recall on ImageNet. The paper was released September 11 2025. Read more: getnews.me/ants-method-uses-multimo... #ants #ooddetection #multimodalllm

1 0 0 0

@softtechhub.bsky.social

6 months ago

Alibaba AI Team Unveils Ovis 2.5 Multimodal LLMs: A Breakthrough in Open Source AI https://softtechhub.us/2025/08/19/alibaba-ai-team-unveils-ovis-2-5/ #AlibabaAI #Ovis25 #MultimodalLLM #OpenSourceAI #TechBreakthrough #ArtificialIntelligence #MachineLearning #AIInnovation #NaturalLanguageProcessing #AIResearch

Alibaba AI Team Unveils Ovis 2.5 Multimodal LLMs: A Breakthrough in Open Source AI
softtechhub.us/2025/08/19/a...

#AlibabaAI #Ovis25 #MultimodalLLM #OpenSourceAI #TechBreakthrough #ArtificialIntelligence #MachineLearning #AIInnovation #NaturalLanguageProcessing #AIResearch

0 1 0 0

@softtechhub.bsky.social

10 months ago

NVIDIA AI Introduces Describe Anything 3B: A new multimodal LLM focused on detailed image and video descriptions https://softtechhub.us/2025/04/26/nvidia-ai-describe-anything-3b/ #NVIDIAAI #DescribeAnything #MultimodalLLM #AI #ImageDescription #VideoAnalysis #TechInnovation #MachineLearning #ArtificialIntelligence #DeepLearning

NVIDIA AI Introduces Describe Anything 3B: A new multimodal LLM focused on detailed image and video descriptions
softtechhub.us/2025/04/26/n...

#NVIDIAAI #DescribeAnything #MultimodalLLM #AI #ImageDescription #VideoAnalysis #TechInnovation #MachineLearning #ArtificialIntelligence #DeepLearning

2 1 0 0

Posts tagged #MultimodalLLM