Trending
Rupak's Avatar

Rupak

@rupak-s

4th year PhD student in UMD CS advised by Philip Resnik. I have also been a research intern at MSR (2024) and Adobe Research (2022).

101
Followers
73
Following
7
Posts
06.12.2024
Joined
Posts Following

Latest posts by Rupak @rupak-s

LLMs didn’t move language modeling research from linguists to AI people, they just moved it from computer scientists who thought language was interesting to computer scientists who thought language was boring

12.12.2025 19:38 πŸ‘ 83 πŸ” 13 πŸ’¬ 4 πŸ“Œ 1
Post image

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

22.10.2025 15:24 πŸ‘ 55 πŸ” 29 πŸ’¬ 5 πŸ“Œ 2
Predoctoral Research Assistant (Contract) – Computational Social Science - Microsoft Research Are you a recent college graduate wishing to gain research experience prior to pursuing a Ph.D. in fields related to computational social science (CSS)? Do you have a deep love of β€œplaying with data”—...

Do you have strong programming skills but need research experience doing meaningful & exciting CSS projects before heading off to a top graduate school for computational social science PhD? Apply now to predoc with me,
@dggoldst.bsky.social @jakehofman.bsky.social www.microsoft.com/en-us/resear...

10.07.2025 15:48 πŸ‘ 10 πŸ” 9 πŸ’¬ 2 πŸ“Œ 1
Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828

Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828 Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric

That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol 🧡

08.07.2025 12:40 πŸ‘ 52 πŸ” 10 πŸ’¬ 3 πŸ“Œ 2
Preview
A Co-op for Computing Faculty are diving into the exciting, data-crunching, AI world of GPMoo.

Honored by the feature on my research, grant, and GPU cluster by the Williams magazine. today.williams.edu/magazine/a-c...

28.05.2025 01:41 πŸ‘ 9 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
A screenshot of a paper showing the title - "Pairscale: Analyzing Attitude Change in Online Communities" by Rupak Sarkar, Patrick Wu, Kristina Miler, Alexander Hoyle and Philip Resnik.

A screenshot of a paper showing the title - "Pairscale: Analyzing Attitude Change in Online Communities" by Rupak Sarkar, Patrick Wu, Kristina Miler, Alexander Hoyle and Philip Resnik.

Are you tired of using traditional stance detection to measure the polarity of text? Our #NAACL25 paper proposes an approach that uses pairwise comparisons to order texts on a continuous scale, capturing both implicit and explicit evidence in language.

πŸ“Today in Hall 3 from 4-5:30pm

Come say hi!

01.05.2025 15:08 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Yes! At Session 1 at Hall 3 tomorrow 4-5.30 PM (CSS track)

30.04.2025 16:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Hi Maria! At NAACL too, let’s catch up!

30.04.2025 15:44 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Check out Neha’s Outstanding Paper Award πŸ† winning research on atomic hypothesis decomposition in Session C at 2 pm today!!

#NAACL2025

30.04.2025 14:11 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

I'll be presenting this work with @rachelrudinger at #NAACL2025 tomorrow (Wednesday 4/30) in Albuquerque during Session C (Oral/Poster 2) at 2pm! πŸ”¬

Decomposing hypotheses in traditional NLI and defeasible NLI helps us measure various forms of consistency of LLMs. Come join us!

29.04.2025 20:40 πŸ‘ 8 πŸ” 3 πŸ’¬ 5 πŸ“Œ 1
Post image

🚨 New Paper 🚨

1/ We often assume that well-written text is easier to translate ✏️

But can #LLMs automatically rewrite inputs to improve machine translation? 🌍

Here’s what we found 🧡

17.04.2025 01:32 πŸ‘ 8 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0

I have so many questions. Why not write a Python script that does the same thing instead of writing a react app. Why not just answer with 3?!

26.02.2025 06:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

So you're saying the next iteration of the model might generate a video of a person counting the number of R's in strawberry to tell me the answer? :P

26.02.2025 01:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
React app made by claude that highlights the r's in strawberry

React app made by claude that highlights the r's in strawberry

I asked Claude 3.7 to count the number of r's in Strawberry ("count the number of r's in strawberry for me") and it wrote a react app that displays a Strawberry, and you click the strawberry to enumerate the number of r's.

Wild. Wondering what kind of alignment policies led to this.

26.02.2025 01:01 πŸ‘ 5 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0