Audio's Role in Modern Video-LLMs Evaluated on New Benchmarks
Adding audio to LLaVA‑OneVision with Whisper and a Mamba token compressor yields marginal gains on standard video benchmarks, but boosts accuracy on AVQA‑Hard and Music‑AVQA‑Hard datasets. Read more: getnews.me/audios-role-in-modern-vi... #avqa #llava
0
0
0
0