- Four attention check questions: The classic "please select 'strongly agree' to show you're reading this" type questions.
- Two consistency checks: Paired questions where humans should answer in predictable patterns. For example, if the answer to one question was true, the other had to be false (and vice versa) because it indicates consistency.
- Two reverse shibboleth items: Questions that your intended sample (i.e., humans) should not know the answer to, but an AI agent likely would.
- Three cognitive traps: These are visual illusions designed to exploit fundamental limitations in vision-enabled AI systems. We used a modified Müller-Lyer illusion, a modified cafe wall illusion, and a ‘moving robot task’, all of which have been shown to be highly discriminative between AI agents and human participants.
- Comprehensive mouse tracking: Recording every cursor movement and click pattern throughout the survey.
- Qualtrics' reCAPTCHA score: The platform's built-in risk scoring system.
We tested five different AI agents, each completing the survey 25 times to match our human sample size:
- GPT Agent: OpenAI's GPT configured to complete surveys
- Claude: Anthropic's Claude accessed via Cursor
- Perplexity: Perplexity AI's search-enhanced agent
- Gemini: Google's Gemini via Project Mariner
- Custom Agent: Our in-house ‘white hat’ agent, designed by Prolific’s AI research engineers to take surveys and avoid detection
The custom agent was crucial to include because it’s specifically designed to take surveys undetected. It’s closer to Westwood’s bot (2025) as an adversarial, customisable agent than the more readily detectable commercial AI agents.
To take part in the survey, we provided all agents with an identical prompt that asked them to complete the survey as a human would and exhibit human-like behaviour.
Prolific’s bot authenticity check (100% accurate)
Mouse tracking (95.0% accurate)
Qualtrics’ reCAPTCHA score (94.2% accurate)
Cognitive traps (85.2% accurate)
Consistency checks (62.7% accurate)
Classic attention checks (59.7% accurate)
Reverse shibboleth questions (58.0% accurate)
Can researchers detect #AI bots taking paid surveys?
#Prolific tested humans and #LLM agents with various #dataQuality checks.
- The company says they caught 100% of the non-humans.
- My take-away: #reCAPTCHA and #mouseTracking caught 95%
www.prolific.com/res...
#surveyMethods