's Avatar

@chenhaotan

Associate professor at the University of Chicago. Working on human-centered AI, NLP, CSS. https://chenhaot.com, https://substack.com/@cichicago

4,086
Followers
310
Following
313
Posts
08.11.2023
Joined
Posts Following

Latest posts by @chenhaotan

The blog actually discussed different types of reviews. If AI reviewing helps authors produce better science, I do not see why one needs to be so hostile against AI. It actually helps authors slow down to produce better-quality articles.

10.03.2026 03:13 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

It is open, can and will be improved! Feedback like these is highly appreciated.

The issue is here: github.com/ChicagoHAI/O...

09.03.2026 20:13 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
OpenAIReview โ€” AI-Powered Academic Paper Reviewer

Try it, read the blog, or contribute:
๐ŸŒ openaireview.github.io
๐Ÿ“ openaireview.github.io/blog.html
๐Ÿ’ป github.com/ChicagoHAI/O...

09.03.2026 18:49 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We also don't have good evaluations for AI-generated reviews yet. We're working on it and welcome collaborators. Feedback welcome, especially from conference organizers and journal editors who want to think seriously about the future of peer review.

09.03.2026 18:49 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

There are two types of reviewing. Reviewing for quality (improving the work) โ€” what Refine and OpenAIReview do โ€” is very different from gatekeeping (accept/reject), which is what Stanford Agentic Reviewer targets. We think automating gatekeeping requires much more care.

09.03.2026 18:49 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Our progressive approach finds issues at 87% of locations flagged by Refine, for the price of a coffee per paper. @joehsu.bsky.social added a Claude skill making it essentially free for Claude subscribers.

09.03.2026 18:49 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

3/7 The only intervention that can stabilize the system is improving review precision, the ability to distinguish good papers from weak ones. AI production tools lower submission costs; only AI review tools can raise the signal. That asymmetry is why we built OpenAIReview.

09.03.2026 18:49 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

2/7 The review death spiral: more submissions โ†’ overloaded reviewers โ†’ noisier reviews โ†’ more random acceptance โ†’ even more submissions. Bergstrom & Gross already warned about this. AI production tools make it worse by lowering submission costs and pushing the system toward collapse faster.

09.03.2026 18:48 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Preview
AI-assisted Reviewing is Necessary and Should be Open Peer review is facing a death spiral. AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open.

Peer review is facing a death spiral, and AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open. We built OpenAIReview: open AI reviewing for everyone, for the cost of a coffee.

openaireview.github.io/blog.html ๐Ÿงต

09.03.2026 18:48 ๐Ÿ‘ 19 ๐Ÿ” 7 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 4

Local ballot measures are now on CivicChats! Local elections happen year-round, 10+ states have measures coming up in the next few months. Check your ballot and think through what you'll be voting on โ†’ civicchats.org

25.02.2026 18:35 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
CivicChats - Building AI to support voting behavior CivicChats is a platform for exploring, debating, and thinking through upcoming ballot measures.

We have been developing automatic evaluation based on checklists. We are also planning to run a study at the same time. Learn more at the end of this blog: cichicago.substack.com/p/civicchats...

20.02.2026 00:33 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Check out our effort in thinking about how AI can help with democratic processes!

19.02.2026 21:48 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Anyone can help reviewing an ACL submission today on parameter efficient fine-tuning?

Sorry that it is very tight.

16.02.2026 19:27 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐Ÿ“– โ‰  ๐Ÿงช The Story is Not the Science.
Code is submitted but rarely executed during peer reviewโ€”an issue likely to worsen with research agents. ๐Ÿง‘โ€๐Ÿ”ฌ
We introduce ๐Œ๐ž๐œ๐ก๐„๐ฏ๐š๐ฅ๐€๐ ๐ž๐ง๐ญ, an execution-grounded evaluation of narrative + execution. ๐•๐ž๐ซ๐ข๐Ÿ๐ฒ ๐ญ๐ก๐ž ๐ฌ๐œ๐ข๐ž๐ง๐œ๐ž, ๐ง๐จ๐ญ ๐ฃ๐ฎ๐ฌ๐ญ ๐ญ๐ก๐ž ๐ฌ๐ญ๐จ๐ซ๐ฒ.
1/n

10.02.2026 19:44 ๐Ÿ‘ 8 ๐Ÿ” 4 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Post image

Mark Yatskar will be speaking this Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

09.02.2026 21:09 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Hannes Stark will be speaking this Friday on BoltzGen!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

02.02.2026 22:47 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Happening in three hours!

30.01.2026 14:03 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Microsoft Research NYC is hiringย a researcher in the space of AI and society!

29.01.2026 23:27 ๐Ÿ‘ 62 ๐Ÿ” 40 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 2
Post image

@profbuehlermit.bsky.social from MIT will be speaking this Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

26.01.2026 20:42 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Happening in two hours!

23.01.2026 15:03 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Peter Clark from @ai2.bsky.social will be speaking on Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

20.01.2026 19:17 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
We study how radiologists use AI to diagnose pulmonary embolism (PE), tracking over 100,000
scans interpreted by nearly 400 radiologists during the staggered rollout of an FDA-approved
diagnostic platform. When AI flags PE, radiologists agree 84% of the time; when AI predicts no PE,
they agree 97%. Disagreement evolves substantially: radiologists initially reject AI-positive PEs in
30% of cases, dropping to 12% by year two. Despite a 16% increase in scan volume, diagnostic speed
remains stable while per-radiologist monthly volumes nearly double, with no change in patient
mortalityโ€”suggesting AI improves workflow without compromising outcomes. We document
significant heterogeneity in AI collaboration: some radiologists reject AI-flagged PEs half the time
while others accept nearly always; female radiologists are 6 percentage points less likely to override AI
than male radiologists. Moderate AI engagement is associated with the highest agreement, whereas
both low and high engagement show more disagreement. Follow-up imaging reveals that when
radiologists override AI to diagnose PE, 54% of subsequent scans show both agreeing on no PE
within 30 days.

We study how radiologists use AI to diagnose pulmonary embolism (PE), tracking over 100,000 scans interpreted by nearly 400 radiologists during the staggered rollout of an FDA-approved diagnostic platform. When AI flags PE, radiologists agree 84% of the time; when AI predicts no PE, they agree 97%. Disagreement evolves substantially: radiologists initially reject AI-positive PEs in 30% of cases, dropping to 12% by year two. Despite a 16% increase in scan volume, diagnostic speed remains stable while per-radiologist monthly volumes nearly double, with no change in patient mortalityโ€”suggesting AI improves workflow without compromising outcomes. We document significant heterogeneity in AI collaboration: some radiologists reject AI-flagged PEs half the time while others accept nearly always; female radiologists are 6 percentage points less likely to override AI than male radiologists. Moderate AI engagement is associated with the highest agreement, whereas both low and high engagement show more disagreement. Follow-up imaging reveals that when radiologists override AI to diagnose PE, 54% of subsequent scans show both agreeing on no PE within 30 days.

Posted a very early stage draft with rock star collaborators.

Key question: when we actually roll out AI tools, how do people use them? Do they just defer completely? Does it improve productivity and ability?

We look in the medical setting of pulmonary embolisms
paulgp.com/papers/Radio...

19.01.2026 20:16 ๐Ÿ‘ 89 ๐Ÿ” 18 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 2

I've often joked that as faculty I program in a high-level language called "graduate student". Having tried out Claude Code this morning, I (i) feel extremely at home, (ii) am realizing that research-by-graduate-student is perhaps the original vibe-coding. 1/2

08.01.2026 12:24 ๐Ÿ‘ 87 ๐Ÿ” 11 ๐Ÿ’ฌ 7 ๐Ÿ“Œ 3

I've seen this message and similar echos for other writing, and I want strongly push back on this narrative. It's not that you shouldn't use ChatGPT but that you shouldn't *use ChatGPT to write it for you*. ChatGPTโ€”and AI in generalโ€”is not a monolith. How you use it matters.

18.01.2026 16:55 ๐Ÿ‘ 8 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Very much enjoyed this talk by @yisongyue.bsky.social ! The measurement challenge deserves a lot more attention from the AI community!

16.01.2026 18:54 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Happening in two hours!

16.01.2026 14:43 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
Title + abstract of the preprint

Title + abstract of the preprint

Excited to present a new preprint with @nkgarg.bsky.social: presenting usage statistics and observational findings from Paper Skygest in the first six months of deployment! ๐ŸŽ‰๐Ÿ“œ

arxiv.org/abs/2601.04253

14.01.2026 19:48 ๐Ÿ‘ 147 ๐Ÿ” 45 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 4
Post image Post image

Emergent misalignment made into @nature.com! The key insight is that models fine-tuned on writing insecure code present a wide range of insecure behavior in other contexts.

15.01.2026 15:37 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I think it would be useful to attract researchers in industry to the platform as well.

13.01.2026 01:28 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0