Narrative and Grounded Supervision Boosts VideoQA Accuracy
A new training framework using narrative‑level and grounded supervision lifts VideoQA performance, with a 3B model reaching 72.5% accuracy on STAR and a 7B model hitting 80.8% on NExT‑QA. getnews.me/narrative-and-grounded-s... #videoqa #multimodalai #nextqa
0
0
0
0