LLMs didnβt move language modeling research from linguists to AI people, they just moved it from computer scientists who thought language was interesting to computer scientists who thought language was boring
LLMs didnβt move language modeling research from linguists to AI people, they just moved it from computer scientists who thought language was interesting to computer scientists who thought language was boring
AI is already at work in American newsrooms.
We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.
Here's what we learned about how AI is influencing local and national journalism:
Do you have strong programming skills but need research experience doing meaningful & exciting CSS projects before heading off to a top graduate school for computational social science PhD? Apply now to predoc with me,
@dggoldst.bsky.social @jakehofman.bsky.social www.microsoft.com/en-us/resear...
Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828 Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations
Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric
That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol π§΅
Honored by the feature on my research, grant, and GPU cluster by the Williams magazine. today.williams.edu/magazine/a-c...
A screenshot of a paper showing the title - "Pairscale: Analyzing Attitude Change in Online Communities" by Rupak Sarkar, Patrick Wu, Kristina Miler, Alexander Hoyle and Philip Resnik.
Are you tired of using traditional stance detection to measure the polarity of text? Our #NAACL25 paper proposes an approach that uses pairwise comparisons to order texts on a continuous scale, capturing both implicit and explicit evidence in language.
πToday in Hall 3 from 4-5:30pm
Come say hi!
Yes! At Session 1 at Hall 3 tomorrow 4-5.30 PM (CSS track)
Hi Maria! At NAACL too, letβs catch up!
Check out Nehaβs Outstanding Paper Award π winning research on atomic hypothesis decomposition in Session C at 2 pm today!!
#NAACL2025
I'll be presenting this work with @rachelrudinger at #NAACL2025 tomorrow (Wednesday 4/30) in Albuquerque during Session C (Oral/Poster 2) at 2pm! π¬
Decomposing hypotheses in traditional NLI and defeasible NLI helps us measure various forms of consistency of LLMs. Come join us!
π¨Β New Paper π¨
1/ We often assume that well-written text is easier to translate βοΈ
But can #LLMs automatically rewrite inputs to improve machine translation? π
Hereβs what we found π§΅
I have so many questions. Why not write a Python script that does the same thing instead of writing a react app. Why not just answer with 3?!
So you're saying the next iteration of the model might generate a video of a person counting the number of R's in strawberry to tell me the answer? :P
React app made by claude that highlights the r's in strawberry
I asked Claude 3.7 to count the number of r's in Strawberry ("count the number of r's in strawberry for me") and it wrote a react app that displays a Strawberry, and you click the strawberry to enumerate the number of r's.
Wild. Wondering what kind of alignment policies led to this.