Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:
Meet me at the Benchmarking workshop (sites.google.com/view/benchma...) at EurIPS on Saturday: We’ll present two works on errors in LLM-as-Judge and their impacts on benchmarking and test-time-scaling:
At #NeurIPS in San Diego this week? Interested in XAI, causality, or performative prediction? Come visit our poster!
💬 Performative Validity of Recourse Explanations
📆 Wednesday, 4.30 pm, Poster Session 2
w/ Hidde Fokkema, Timo Freiesleben, Celestine Mendler-Dünner, Ulrike von Luxburg
Attending #Neurips2025? Get your personalized Scholar Inbox conference program now to easily navigate the poster sessions and find what you are looking for:
www.scholar-inbox.com/conference/n...
I'll be @neuripsconf.bsky.social presenting Strategic Hypothesis Testing (spotlight!)
tldr: Many high-stakes decisions (e.g., drug approval) rely on p-values, but people submitting evidence respond strategically even w/o p-hacking. Can we characterize this behavior & how policy shapes it?
1/n
The empirical landscape sits between the two extremes.
- Model similarity is high, yet disagreements let individuals find recourse by switching models.
- Systemic exclusion is rare, yet more likely than under strong multiplicity.
- Even in a single model, prompt variations induce multiplicity.
We evaluate 50 LLMs (various sizes & providers) across 6 tasks to assess how well each narrative fits the current LLM landscape, assuming that decision makers will increasingly rely on these models for consequential predictions.
There are two narratives about model ecosystems that grew out of the algorithmic fairness debate:
1. Monoculture: models converge toward homogeneity.
2. Multiplicity: many models solve tasks similarly but disagree on individual predictions, creating outcome variation.
Excited to be at #Neurips2025 this week to present our paper "Monoculture or Multiplicity: Which is it?", joint work with Moritz Hardt.
📄 Paper #1000: openreview.net/pdf?id=DO5Lt...
📍 Wed, Dec 3, 2025 • 4:30 PM – 7:30 PM
Feel free to come by and reach out!
A short 🧵.