HUJI NLP (@nlphuji)

That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social

They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...

24.04.2025 12:34 👍 6 🔁 3 💬 0 📌 0

Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!

17.03.2025 14:37 👍 11 🔁 3 💬 1 📌 2

Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3

11.03.2025 14:32 👍 3 🔁 3 💬 2 📌 0

There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693

03.02.2025 08:04 👍 0 🔁 2 💬 1 📌 2

- “I heard there’s a new paper about Theory of Mind in LLMs!”
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?

Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.

19.12.2024 13:05 👍 8 🔁 2 💬 0 📌 0

JuStRank: Benchmarking LLM Judges for System Ranking Given the rapid progress of generative AI, there is a pressing need to systematically compare and choose between the numerous models and configurations available. The scale and versatility of such eva...

New preprint! ✨
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569

13.12.2024 10:16 👍 9 🔁 5 💬 1 📌 1

1/n First time in the sky ✈️

I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴

How do religious trajectories evolve in Holocaust testimony narratives?

21.11.2024 15:13 👍 6 🔁 2 💬 1 📌 1

HUJI NLP

Latest posts by HUJI NLP @nlphuji