#MultiModalAI

2 days ago

AI safety benchmarks built on Western data miss how risk actually looks across cultures.

MLCommons is fixing that — 7,000+ multimodal prompts from APAC, built with regional experts from Singapore, India, and Korea.

mlcommons.org/2026/03/airr...

#MLCommons #AILuminate #MultimodalAI

0 0 0 0

Winbuzzer

@winbuzzer.com

3 days ago

Gemini Embedding 2 Unifies Text, Images, Video in One Model Google has launched Gemini Embedding 2, its first natively multimodal embedding model supporting text, images, video, audio, and documents for enterprise use.

winbuzzer.com/2026/03/12/g...

Gemini Embedding 2 Unifies Text, Images, Video in One Model

#AI #Google #BigTech #GoogleGemini #EnterpriseAI #MultimodalAI #AISearch #AIAudio #AIVideo #AIImages #GoogleAI #GoogleDeepMind #GeminiEmbedding2

0 0 0 0

Bill Psomas

@billpsomas.bsky.social

3 days ago

#GreeksInAI #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NLP #ComputerVision #Robotics #MultimodalAI #TrustworthyAI #AIResearch #Innovation #Greece #Athens

1 0 1 0

MSK Library

@msklibrary.bsky.social

4 days ago

Synapse: Your Connection to our MSK Authors
Meet: Sophia Meixuan Zhang
Research Focus: SKI-Pediatrics; Research Tech

Prompt-based multimodal representation learning for drug repurposing
synapse.mskcc.org/synapse/work...

#DrugRepurposing #AIinMedicine
#MultimodalAI #MachineLearning
#DeepLearning

1 0 0 0

@softtechhub.bsky.social

6 days ago

Microsoft’s Phi-4-Reasoning-Vision-15B: The AI Model That Knows When to Think and When Not To

softtechhub.us/2026/03/09/p...

#MicrosoftAI #Phi4 #Phi4Reasoning #AIModels #ReasoningAI #VisionAI #GenerativeAI #MachineLearning #MultimodalAI #AIInnovation #TechNews #DeepLearning #NextGenAI #FutureOfAI

1 0 0 0

Harald Klinke

@hxxxkxxx.det.social.ap.brid.gy

1 week ago

The image displays a flowchart illustrating an editing process for images. It includes categories for editing types, a dataset composition pie chart, and three examples of image modifications, each with a status indicator showing success or failure. Elements include icons, visual data,

Der Datasatz „Pico-Banana-400K“ zeigt einen wichtigen Trend in der KI-Forschung: Der Fokus verschiebt sich von Bildgenerierung zu instruktionsbasierter Bildbearbeitung.
Modelle lernen nicht nur Bilder zu erzeugen, sondern gezielt zu verändern – ein Schritt […]

[Original post on det.social]

0 0 0 0

The Science Matters

@tscimat.bsky.social

1 week ago

The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation From Recognition to Reasoning This survey paper chronicles the evolution of evaluation in multimodal artificial intelligence (AI), framing it as a progression of increasingly sophisticated “cognitive examinations.” We argue that…

Research: doi.org/10.1109/ACCE... The Artificial Intelligence Cognitive Examination: , IEEE Access @ieeeaccess.bsky.social

#ArtificialIntelligence #AIResearch #MachineLearning #AIEvaluation #MultimodalAI #TechEthics #IEEEAccess #ScienceCommunications

1 1 0 0

Awesome Agents

@awesomeagents.bsky.social

1 week ago

Luma Launches Agents for End-to-End Creative Work Luma AI's new Agents platform, powered by the Uni-1 Unified Intelligence model, lets creative teams go from a written brief to finished video, images, and audio in one workflow.

Luma Launches Agents for End-to-End Creative Work

awesomeagents.ai/news/luma-agents-unified...

#LumaAi #AiAgents #MultimodalAi

0 0 0 0

Timelines

@hulio-ai.bsky.social

1 week ago

🤖 Multimodal AI: New models handle text, image, and video together.
🔬 Science: AI speeds up drug discovery and protein folding.
⚡ Efficiency: Smaller models are now as strong as big ones.
#AI2024 #MultimodalAI #ScienceAI #EfficientAI
View in Timelines

0 0 0 0

AI Daily Post

@aidailypost.com

1 week ago

Black Forest Labs just dropped Self‑Flow, a new trick that makes multimodal AI training 2.8× faster than REPA. Faster feature alignment means cheaper compute and quicker breakthroughs. Curious? Dive in! #SelfFlow #MultimodalAI #ComputationalEfficiency

🔗 aidailypost.com/news/black-f...

1 0 0 0

AI Daily Post

@aidailypost.com

1 week ago

Microsoft just dropped Phi‑4, a 15B reasoning‑vision model that’s tiny, fast, and ready for low‑latency AI. Perfect for edge inference and multimodal tricks. Curious how compact can be powerful? Dive in! #Phi4 #LowLatencyAI #MultimodalAI

🔗 aidailypost.com/news/microso...

0 0 1 0

Matteo Negri

@matteo-negri.bsky.social

1 week ago

Subtitling track Home of the IWSLT conference and SIGSLT.

🚀 Call for Participation: @iwslt Subtitling 2026

Turn speech into ready-to-watch subtitles 🎬 across TV, News & YouTube!

📅 Evaluation: Apr 1–15
iwslt.org/2026/subtitl...

#IWSLT2026 #SpeechAI #MultimodalAI

2 1 0 0

Timelines

@hulio-ai.bsky.social

2 weeks ago

🌐 Multimodal AI: Unified models handle text, images, audio, code.
🤖 Autonomous Agents: AI plans & executes tasks independently.
⚡ Edge AI: Low-power models enable fast, private processing.
#AI2026 #MultimodalAI #AutonomousAI #EdgeAI
View in Timelines

0 0 0 0

AI Daily Post

@aidailypost.com

2 weeks ago

New AI tools let scientists mash up RNA seq, imaging & more to map cellular states in one go. Imagine decoding biology faster than ever. Dive into how multimodal AI is reshaping cell biology research! #MultimodalAI #CellBiology #DataIntegration

🔗 aidailypost.com/news/ai-enab...

0 0 0 0

AI Daily Post

@aidailypost.com

3 weeks ago

Gemini just got a creative upgrade—now it can spin music while cranking out images and video. Dive into how DeepMind’s Lyria 3 is pushing multimodal AI into new artistic territory. 🎶🤖 #GoogleGemini #MusicGeneration #MultimodalAI

🔗 aidailypost.com/news/gemini-...

0 0 0 0

@kogents.bsky.social

1 month ago

Infography-#142-1080.jpg

Context breaks when channels change. One AI brain fixes that.

Voice + chat + email..... unified, intelligent, continuous.

→ kogents.ai

#EnterpriseAI #MultimodalAI #KogentsAI #CallAutomation #CES #AAAI #AgenticAI

0 0 0 0

AI Daily Post

@aidailypost.com

1 month ago

ByteDance just dropped Seedance 2.0, a multimodal AI that turns text, images, audio and video into ready‑to‑watch clips. Think OpenAI’s Sora meets Google Veo—next‑gen video creation is here. Dive in to see what this could mean for creators. #Seedance2 #MultimodalAI #VideoAI

🔗

0 0 0 0

AI Daily Post

@aidailypost.com

1 month ago

Big shake‑ups at xAI keep rolling while Lambda teases a 2025 pivot to bigger context windows and multimodal reasoning. Wonder how this reshapes open‑source inference? Dive in for the details. #AIProduction #MultimodalAI #xAI

🔗 aidailypost.com/news/xai-co-...

0 0 0 0

AI Daily Post

@aidailypost.com

1 month ago

ByteDance just dropped Seedance 2.0 - a multi-modal AI that can watch a clip and remix it into fresh video. Think reference-guided text-to-video on steroids. Curious? Dive into the details. #Seedance2 #MultiModalAI #TextToVideo

🔗 aidailypost.com/news/bytedan...

0 0 0 0

Timelines

@hulio-ai.bsky.social

1 month ago

📈AI Market CAGR: Overall AI to grow 26.6%-41.95% CAGR, USD 375B-434B (2026)→USD 2.5T (2031-34)
💡Key Sectors: Multimodal AI: 36%, Quantum AI: 35.1%, AI in Transport: up to 22.7%, SLM: 15.1%
#AIgrowth #AIMarket #MultimodalAI #QuantumAI #TransportAI #SLM
View in Timelines

0 0 0 0

Arif Solmaz

@arifsolmaz.bsky.social

1 month ago

Run Gemini 2.5 Flash-level multimodal AI on your phone: 9B parameter model handles vision, speech, a

Run Gemini 2.5 Flash-level multimodal AI on your phone: 9B parameter model handles vision, speech, and full-duplex streaming conversations locally

🔗 https://github.com/OpenBMB/MiniCPM-o

#MultimodalAI #EdgeML #VoiceAI

0 0 0 0

ITmatters - Tech News Platform

@itmatterss.bsky.social

1 month ago

Sarvam Vision in a League of Its Own for Indic OCR Across 22 Languages, Outpacing Gemini, GPT Sarvam AI launches Sarvam Vision, a 3B-parameter vision-language model designed for multilingual OCR, and visual reasoning.

Sarvam AI launches Sarvam Vision, a 3B vision-language model focused on Indic OCR across 22 languages. In company benchmarks, the model performs ahead of Gemini and GPT. Read more about it here:

itmatterss.in/industry/ai/...

#SarvamAI #AIIndia #OCR #MultimodalAI

0 0 0 0

Arif Solmaz

@arifsolmaz.bsky.social

1 month ago

ByteDance's open-source multimodal AI agent that controls your desktop, browser, and terminal throug

ByteDance's open-source multimodal AI agent that controls your desktop, browser, and terminal through vision - like having an AI assistant that can actually see and click

🔗 https://github.com/bytedance/UI-TARS-desktop

#MultimodalAI #GUIAgent #DesktopAutomation

0 0 0 0

Proceedings Series MDPI

@proceedingsmdpi.bsky.social

1 month ago

Audiovisual Fusion Technique for Detecting Sensitive Content in Videos
www.mdpi.com/2673-4591/12...

By Daniel Povedano Álvarez et al.
From the First Summer School on Artificial Intelligence in Cybersecurity

#ContentModeration #MultimodalAI #DeepLearning

0 0 0 0

HackerNoon

@hackernoon.com

1 month ago

AI in 2026: Function Calling, Reasoning Models, and a New Runtime Era

Function calling turned LLMs from chatbots into action systems—reshaping AI runtimes, security, reasoning models, and specialization. #multimodalai

0 0 0 0

AI & ML News

@ai-news.at.thenote.app

1 month ago

Youtu-VL Shows How Treating Vision as a Target Unlocks Better Multimodal AI Apache Spark 4.1 marks a shift from hand-crafted data pipelines to declarative design, reducing operational complexity through automated optimization, incremental views, built-in CDC, and native data quality checks.

Youtu-VL Shows How Treating Vision as a Target Unlocks Better Multimodal AI

Apache Spark 4.1 marks a shift from hand-crafted data pipelines to declarative design, reducing operational complexity through automated optimization, incremental views, built-in…

Telegram AI Digest
#ai #multimodalai #news

0 0 0 0

AI и ML Новости

@ai-ru.at.thenote.app

1 month ago

Youtu-VL Shows How Treating Vision as a Target Unlocks Better Multimodal AI

Youtu-VL показывает, как рассмотрение зрения как цели открывает путь к лучшему многомодальному ИИ

Apache Spark 4.1 знаменует переход от ручных конвейеров данных к декларативному дизайну, снижая операционную сложность за счет автоматической оптимизации,…

Telegram ИИ Дайджест
#ai #multimodalai #news

0 0 0 0

HackerNoon

@hackernoon.com

1 month ago

Youtu-VL Shows How Treating Vision as a Target Unlocks Better Multimodal AI

Apache Spark 4.1 introduces declarative pipelines, materialized views, and built-in data quality—reshaping how modern data systems are designed. #multimodalai

0 0 0 0

AI & ML News

@ai-news.at.thenote.app

1 month ago

HERMES Rewrites the Rules of Streaming Video for Multimodal AI HERMES shows that real-time video AI fails not from lack of memory, but from storing everything equally. By hierarchically compressing older context while preserving recent detail, it enables faster, more accurate streaming video understanding.

HERMES Rewrites the Rules of Streaming Video for Multimodal AI

HERMES shows that real-time video AI fails not from lack of memory, but from storing everything equally. By hierarchically compressing older context while preserving recent detail, it enables…

Telegram AI Digest
#ai #multimodalai #news

0 0 0 0

AI и ML Новости

@ai-ru.at.thenote.app

1 month ago

HERMES Rewrites the Rules of Streaming Video for Multimodal AI

HERMES переписывает правила потокового видео для мультимодального ИИ

HERMES показывает, что видео AI в реальном времени терпит неудачу не из-за нехватки памяти, а из-за одинакового хранения всего. Иерархически сжимая старый контекст, сохраняя при этом …

Telegram ИИ Дайджест
#ai #multimodalai #news

1 0 0 0

Posts tagged #MultiModalAI