That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social
They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
24.04.2025 12:34
👍 6
🔁 3
💬 0
📌 0
Care about LLM evaluation? 🤖 🤔
We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
17.03.2025 14:37
👍 11
🔁 3
💬 1
📌 2
Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
11.03.2025 14:32
👍 3
🔁 3
💬 2
📌 0
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
03.02.2025 08:04
👍 0
🔁 2
💬 1
📌 2
- “I heard there’s a new paper about Theory of Mind in LLMs!”
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?
Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.
19.12.2024 13:05
👍 8
🔁 2
💬 0
📌 0
1/n First time in the sky ✈️
I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴
How do religious trajectories evolve in Holocaust testimony narratives?
21.11.2024 15:13
👍 6
🔁 2
💬 1
📌 1