The blog actually discussed different types of reviews. If AI reviewing helps authors produce better science, I do not see why one needs to be so hostile against AI. It actually helps authors slow down to produce better-quality articles.
The blog actually discussed different types of reviews. If AI reviewing helps authors produce better science, I do not see why one needs to be so hostile against AI. It actually helps authors slow down to produce better-quality articles.
It is open, can and will be improved! Feedback like these is highly appreciated.
The issue is here: github.com/ChicagoHAI/O...
Try it, read the blog, or contribute:
๐ openaireview.github.io
๐ openaireview.github.io/blog.html
๐ป github.com/ChicagoHAI/O...
We also don't have good evaluations for AI-generated reviews yet. We're working on it and welcome collaborators. Feedback welcome, especially from conference organizers and journal editors who want to think seriously about the future of peer review.
There are two types of reviewing. Reviewing for quality (improving the work) โ what Refine and OpenAIReview do โ is very different from gatekeeping (accept/reject), which is what Stanford Agentic Reviewer targets. We think automating gatekeeping requires much more care.
Our progressive approach finds issues at 87% of locations flagged by Refine, for the price of a coffee per paper. @joehsu.bsky.social added a Claude skill making it essentially free for Claude subscribers.
3/7 The only intervention that can stabilize the system is improving review precision, the ability to distinguish good papers from weak ones. AI production tools lower submission costs; only AI review tools can raise the signal. That asymmetry is why we built OpenAIReview.
2/7 The review death spiral: more submissions โ overloaded reviewers โ noisier reviews โ more random acceptance โ even more submissions. Bergstrom & Gross already warned about this. AI production tools make it worse by lowering submission costs and pushing the system toward collapse faster.
Peer review is facing a death spiral, and AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open. We built OpenAIReview: open AI reviewing for everyone, for the cost of a coffee.
openaireview.github.io/blog.html ๐งต
Local ballot measures are now on CivicChats! Local elections happen year-round, 10+ states have measures coming up in the next few months. Check your ballot and think through what you'll be voting on โ civicchats.org
We have been developing automatic evaluation based on checklists. We are also planning to run a study at the same time. Learn more at the end of this blog: cichicago.substack.com/p/civicchats...
Check out our effort in thinking about how AI can help with democratic processes!
Anyone can help reviewing an ACL submission today on parameter efficient fine-tuning?
Sorry that it is very tight.
๐ โ ๐งช The Story is Not the Science.
Code is submitted but rarely executed during peer reviewโan issue likely to worsen with research agents. ๐งโ๐ฌ
We introduce ๐๐๐๐ก๐๐ฏ๐๐ฅ๐๐ ๐๐ง๐ญ, an execution-grounded evaluation of narrative + execution. ๐๐๐ซ๐ข๐๐ฒ ๐ญ๐ก๐ ๐ฌ๐๐ข๐๐ง๐๐, ๐ง๐จ๐ญ ๐ฃ๐ฎ๐ฌ๐ญ ๐ญ๐ก๐ ๐ฌ๐ญ๐จ๐ซ๐ฒ.
1/n
Mark Yatskar will be speaking this Friday!
You can tune in either on
Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...
Hannes Stark will be speaking this Friday on BoltzGen!
You can tune in either on
Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...
Happening in three hours!
Microsoft Research NYC is hiringย a researcher in the space of AI and society!
@profbuehlermit.bsky.social from MIT will be speaking this Friday!
You can tune in either on
Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...
Happening in two hours!
Peter Clark from @ai2.bsky.social will be speaking on Friday!
You can tune in either on
Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...
We study how radiologists use AI to diagnose pulmonary embolism (PE), tracking over 100,000 scans interpreted by nearly 400 radiologists during the staggered rollout of an FDA-approved diagnostic platform. When AI flags PE, radiologists agree 84% of the time; when AI predicts no PE, they agree 97%. Disagreement evolves substantially: radiologists initially reject AI-positive PEs in 30% of cases, dropping to 12% by year two. Despite a 16% increase in scan volume, diagnostic speed remains stable while per-radiologist monthly volumes nearly double, with no change in patient mortalityโsuggesting AI improves workflow without compromising outcomes. We document significant heterogeneity in AI collaboration: some radiologists reject AI-flagged PEs half the time while others accept nearly always; female radiologists are 6 percentage points less likely to override AI than male radiologists. Moderate AI engagement is associated with the highest agreement, whereas both low and high engagement show more disagreement. Follow-up imaging reveals that when radiologists override AI to diagnose PE, 54% of subsequent scans show both agreeing on no PE within 30 days.
Posted a very early stage draft with rock star collaborators.
Key question: when we actually roll out AI tools, how do people use them? Do they just defer completely? Does it improve productivity and ability?
We look in the medical setting of pulmonary embolisms
paulgp.com/papers/Radio...
I've often joked that as faculty I program in a high-level language called "graduate student". Having tried out Claude Code this morning, I (i) feel extremely at home, (ii) am realizing that research-by-graduate-student is perhaps the original vibe-coding. 1/2
I've seen this message and similar echos for other writing, and I want strongly push back on this narrative. It's not that you shouldn't use ChatGPT but that you shouldn't *use ChatGPT to write it for you*. ChatGPTโand AI in generalโis not a monolith. How you use it matters.
Very much enjoyed this talk by @yisongyue.bsky.social ! The measurement challenge deserves a lot more attention from the AI community!
Happening in two hours!
Title + abstract of the preprint
Excited to present a new preprint with @nkgarg.bsky.social: presenting usage statistics and observational findings from Paper Skygest in the first six months of deployment! ๐๐
arxiv.org/abs/2601.04253
Emergent misalignment made into @nature.com! The key insight is that models fine-tuned on writing insecure code present a wide range of insecure behavior in other contexts.
I think it would be useful to attract researchers in industry to the platform as well.