Neha Balamurugan (@nbalamur)

Spot The Ball: A Benchmark for Visual Social Inference A new benchmark for evaluating visual social reasoning in VLMs using sports scenes.

I led this work with Sarah Wu, Adam Chun, Gabe Gaw, Cristóbal Eyzaguirre, and Professor Tobias Gerstenberg.

🧩 Website : nehabalamurugan.com/spot-the-bal...
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261

13.11.2025 03:10 👍 0 🔁 0 💬 0 📌 0

Our goal is to probe whether models possess the social understanding to infer hidden states from body orientation, gaze, and contextual cues that humans naturally exploit to motivate innovations towards this capacity.

13.11.2025 03:10 👍 0 🔁 0 💬 1 📌 0

We examined the reasoning text produced by humans and models. Models refer to pose far more often than gaze, except under chain-of-thought prompting, which pushes them toward more balanced, human-like reasoning patterns.

13.11.2025 03:09 👍 0 🔁 0 💬 1 📌 0

We found that models often rely on simple cues such as guessing near a player or near the image center to solve the task.

13.11.2025 03:09 👍 0 🔁 0 💬 1 📌 0

We find that humans outperform all models (Gemini, GPT, LLaMA, Qwen) across all prompting strategies. Accuracy is 2–3X higher for humans, and the Wasserstein distances show that models’ guess distributions are not similar to human distributions.

13.11.2025 03:08 👍 0 🔁 0 💬 1 📌 0

Contributions of the work:
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, volleyball, and basketball images
3️⃣ Scalable image-generation pipeline for any sport with a ball

13.11.2025 03:08 👍 0 🔁 0 💬 1 📌 0

In Spot the Ball, the goal is to infer the location of a removed ball from a sports frame. This task evaluates a model’s ability to localize a hidden object through reasoning over social and physical contextual cues such as players’ gaze, body orientation, and position.

13.11.2025 03:08 👍 0 🔁 0 💬 1 📌 0

🧠⚽️🏀🏐 Preprint Alert!!
We built the Spot The Ball benchmark to test visual social inference – the ability to infer missing information from others’ behavior – in Vision Language Models.

Try the task yourself here: nehabalamurugan.com/spot-the-bal...

13.11.2025 03:06 👍 4 🔁 1 💬 1 📌 2

I led this work supported dearly by Sarah Wu, Adam Chun, Gabe Gaw, Cristóbal Eyzaguirre, and Professor Tobias Gerstenberg.

06.11.2025 19:51 👍 0 🔁 0 💬 0 📌 0

Spot The Ball: A Benchmark for Visual Social Inference A new benchmark for evaluating visual social reasoning in VLMs using sports scenes.

🧩 Website : nehabalamurugan.com/spot-the-bal...
📊 Dataset: huggingface.co/datasets/neh...
📄 Preprint: arxiv.org/abs/2511.00261

06.11.2025 19:50 👍 0 🔁 0 💬 1 📌 0

Our goal with this work is to motivate progress in social inference for AI. We hope this benchmark motivates architectural innovations that help models understand social information as robustly, if not better, than humans do to allow safe deployment in human-AI contexts.

06.11.2025 19:47 👍 0 🔁 0 💬 1 📌 0

We then examined the reasoning text produced by humans and models to learn that models reference pose far more often than gaze, except under chain-of-thought prompting, which pushes them toward more balanced, human-like reasoning patterns.

06.11.2025 19:46 👍 0 🔁 0 💬 1 📌 0

We found that models often rely on simple cues such as guessing near a player or near the image center to solve the task.

06.11.2025 19:45 👍 0 🔁 0 💬 1 📌 0

Humans outperform all models (Gemini, GPT, LLaMA, Qwen) across all prompting strategies. Accuracy is 2–3X higher for humans, and the Wasserstein distances show that models’ guess distributions are not similar to human distributions.

06.11.2025 19:45 👍 0 🔁 0 💬 1 📌 0

Contributions of the work:
1️⃣ Spot The Ball task with human baselines
2️⃣ Large dataset including soccer, basketball, and volleyball images
3️⃣ Scalable image-generation pipeline for any sport with a ball

06.11.2025 19:44 👍 0 🔁 0 💬 1 📌 0

This task evaluates a model’s ability to localize a hidden object through reasoning over social and physical contextual cues such as players’ gaze, body orientation, and spatial positioning, rather than relying on direct visual evidence of the object itself in addition to sport specific knowledge.

06.11.2025 19:44 👍 0 🔁 0 💬 1 📌 0

In Spot the Ball, the task is to infer the location of a removed ball from a sports frame. Models and humans output a cell selection of the location of the ball as well as a reason for their selection.

06.11.2025 19:43 👍 0 🔁 0 💬 1 📌 0

Come chat! 🎤
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”

28.07.2025 21:48 👍 4 🔁 1 💬 0 📌 0

We also built:
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work

28.07.2025 21:46 👍 1 🔁 0 💬 1 📌 0

Results:
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.

28.07.2025 21:46 👍 1 🔁 0 💬 1 📌 0

Three prompt types, increasing in reasoning complexity:
🔹 Basic: “Which grid cell contains the ball?”
🔹 Implicit: Encourages attention to pose/gaze
🔹 Chain-of-thought: Step-by-step inference

28.07.2025 21:45 👍 1 🔁 0 💬 1 📌 0

The task is mapped to a 6×10 grid → a 60-class classification problem.
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.

28.07.2025 21:45 👍 1 🔁 0 💬 1 📌 0

In high-stakes, real-world scenes, humans infer what's missing, a crucial skill in driving, robotics, and sports.
We isolate this in a simple but rich task: spot the masked ball from a single frame.

28.07.2025 21:43 👍 1 🔁 0 💬 1 📌 0

The Spot the Ball game has been around for decades.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.

28.07.2025 21:42 👍 1 🔁 0 💬 1 📌 0

🧠⚽ Spot the ball! New benchmark for visual scene understanding!
We ask: Can people and models locate a hidden ball in sports images using only visual context and reasoning?
🕹️ Try the task: v0-new-project-9b5vt6k9ugb.vercel.app
#CogSci2025

28.07.2025 21:41 👍 10 🔁 6 💬 2 📌 1

Neha Balamurugan

Latest posts by Neha Balamurugan @nbalamur