A few months ago I realized my rare and valuable skillโwriting code as a historianโwas still valuable but no longer rare. Now I'm thinking about what it means when the technical barriers to digital history drop away.
A few months ago I realized my rare and valuable skillโwriting code as a historianโwas still valuable but no longer rare. Now I'm thinking about what it means when the technical barriers to digital history drop away.
Lincoln asks, what if generative AI has reduced the technical barriers to doing digital history? Similarly I ask, with an example, what if vibing Just Works?
How are historians rethinking environmental and social history via improved OCR of imperial archives?
Join us this Wednesday 3pm UK time to hear from @jimclifford.bsky.social and @historyjacob.bsky.social - registration link below.
A bar chart showing the electricity use of several daily activities with the subtitle "The 'typical query' is not a useful way to think about coding agents' energy use." The bar for a 'typical ChatGPT query' is not even visible. My median Claude Code session is somewhere between the average US household per minute and toasting bread for three minutes. My median day with Claude Code is something like running a dishwasher.
Whenever I read discourse on AI energy/water use that focuses on the "median query," I can't help but feel misled. Coding agents like Claude Code send hundreds of longer-than-median queries every session, and I run dozens of sessions a day.
On my blog: www.simonpcouch.com/blog/2026-01...
Join the Lancaster-Manchester Environmental #DH Seminar on March 11 @ 3pm UK (online) for a talk by @jimclifford.bsky.social & @historyjacob.bsky.social:
"Solving OCR: Using olmOCR to Follow Commodities across the British World"
www.eventbrite.co.uk/e/solving-oc...
#dhist #ocr #envhist ๐๏ธ
In early 2024, researchers were already heavily using AI for work - Survey of 816 verified authors via Semantic Scholar - 81% of researchers reported using LLMs in their workflow - Top uses: information seeking & editing - Rare for data tasks: 69๎73% never use LLMs for data cleaning or generation
The measurement problem LLM content has risen sharply in both review and non-review papers. Review papers do have a higher prevalence rate. But non-review LLM papers outnumber review papers ๎ฃ6x. CS.CY ๎Computers & Society) faces potential 50% cuts compared to CS.CV (Computer Vision) would only face 3%
Interdisciplinary researchers โ who move between cultures and write in the โborderlandsหฎ โ are experts at adapting their writing. LLMs currently are not.
Private information can appear in unlikely prompts
I gave a short talk at Cornell yesterday on my science-of-science work investigating how AI is being used by researchers and how we should go about crafting policies in response.
Blanket policies are hard, privacy is important, we need more measurement.
Slides: drive.google.com/file/d/1gNTK...
Title, author list, and two figures from the paper. Title: The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors Authors: Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo Figure 1: On the left is a math problem, where students are asked to draw x < 5/2 on a number line. The right side shows two example student responses that differ in correctness. DrawEduMath pairs each math problem with one student response, and prompts VLMs to answer questions about the student response. Figure 2: VLMs consistently perform worse on answering DrawEduMath benchmark questions pertaining to erroneous student responses. Performance on non-erroneous student responses is labeled with specific VLMsโ names; that same modelโs performance on erroneous student responses is directly below.
Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. ๐งต
This whole thread is great. I've been focusing so much on how AI can short-circuit learning that I haven't been considering the flip side of spurring motivation by expanding what ppl (in my case, students) think they're capable of doing
I find this to be the most depressing example of agentic AI adoption out there (as someone who does see real value in agentic coding tools for DH), but I'm even more depressed by the "abandon Canvas" solutions I see folks posting in response. This impacts anything involving a computer, period. (4/5)
If youโre enrolled in online courses with the singular goal of getting a degree as quickly and affordably and flexibly as possible so that you can get a better jobโฆthere are some pretty powerful incentives for you to use these things companion.ai/einstein
And there is a real difference between plagiarism and agentic LLMโs in terms of detection. Iโd love to think what keeps students from cheating are norms, but thereโs also a much higher chance of getting caught if you plagiarize than if you use some LLM to write your papers. +
Agentic models are an existential threat to online asycnch classes and these classes are also overwhelmingly what students want/can take. Which leaves us with the remedy of the kind of cultural norms you mention or super invasive surveillance +
Which makes total sense for your institutional context! I just have to remind myself that my own experience as someone who mostly teaches in person classes is not at all reflective of most of my colleagues and that they are the ones on the front lines of this. +
I appreciate this. Curious: did your group talk at all about online courses? It seems like most of the recs are geared towards in person classes and residential campuses. Something like 50% of my departmentโs sections are online asynchronous (and theyโre our highest enrolling sections)
Ran the same OCR models on 68 pages of historic newspaper. Every model hallucinated or looped.
DeepSeek-OCR-2, LightOnOCR-2, GLM-OCR โ all melt down on dense newspaper columns.
You can try yourself using this @hf.co dataset: huggingface.co/datasets/dav...
After years of teaching Civil War & Reconstruction, I've finally given up. You win, students: it's succession.
A lesson about networks that Iโm kind of proud of: at the beginning of the year, I gave my 200 students a dumb survey: fav books, foods, etc. I told them they could use pseudonyms and that Iโd share their responses with the class. Weโve been using that data in various ways: to make points about +
I canโt speak for them, but I will say the step change in agentic model capacities in the last few months makes the specter of getting replaced even more real.
Anyway, it's a great post that is well worth reading and has helped me clarify some scattered thoughts that have been rattling around my brain recently! resobscura.substack.com/p/what-is-ha...
Excerpt from Benjamin Breen, "What is happening to writing?": "When I think of Al-proof jobs, I think of people like electricians, plumbers, or the surf instructors of Santa Cruz. But I also think about history professors and anyone else whose output includes some combination of in-person engagement and travel-based or otherwise embodied work in a regulated industry. No less than a surf instructor, historians are performing physical services in the real world, although we don't tend to think of it in those terms. We are going into parish church basements to read through baptismal records, finding weird old non-digitized books in rare book shops, piecing together who called Margaret Mead on a certain day in 1954 by reading through her secretary's notes. These are not the everyday tasks of a historian's life, but they are the kind to things we might do, say, once a week. Couple that with twice a week in-person classroom time, and I simply flatly disagree with anyone who thinks this combo will be replaced by a Sonnet 4.6-type model, no matter how good it gets at creating Excel spreadsheets, translating Latin, or explaining linear algebra."
An aside: weekly archival research trips and 2x week in-person classes is what I WISH the job looked like, but the reality for most academic historians is: minimal travel funds, online asynch courses, and lots of "computer work" tasks. This can't be our defense of what makes our jobs AI-proof +
I think @resobscura.bsky.social's ability to use vibe coding to build such creative projects (historical simulators, concordance tools, roleplayers) is tied to their years of becoming a creative, thoughtful writer and reader. Can our students vibe code effectively without that foundation? +
I also keep coming back to how different our experiences are as academics (got a PhD, wrote a book) vs. our students going into jobs where virtually all the writing they do will be done with GenAI. @lauraknelson.bsky.social made this point quite compellingly: bsky.app/profile/laur... +
They compare their experience of the dopamine, slot-machine nature of vibe coding to build really useful and fascinating DH projects vs. the "obsessive flow you get from deep immersion in writing a book." Agreed! +
So much to chew on in @resobscura.bsky.social's latest on the effects of Gen AI on writing and how that connects to vibe coding: resobscura.substack.com/p/what-is-ha... +
@kmcdono.bsky.social @danielwilson.bsky.social and I have a new OA article out: eur01.safelinks.protection.outlook.com?url=https%3A... Itโs about the fragmented landscape of historical data, and what we can do about it to improve discoverability, sustainability and reuse.
This is @emollick.bsky.social "jagged frontier": Claude Code spent 30 minutes completely failing to transcribe a 4 pg. historical source but spun up a fully interactive web map based on that source in less <10 minutes.
So much for soup to nuts. Next I gave it a transcribed version of the data I had typed into a CSV file years ago. 7 minutes later, I had a fully interactive web map. I hit my rate limit before I could make any further tweaks but even this first autonomous pass is pretty stunning: +
Poor OCR'd version of mail transit table from 1882.
Womp womp. Complete failure. It kept trying to use OCR packages to extract the data rather than vision + reasoning, which was never going to work with this kind of messy tabular source. I finally shut it down after 30 min. of spinning its wheels with gibberish OCR: +
I wanted to test out the full "soup to nuts" agentic capacity of CC with one-shot prompting, so I started by just giving it the four-page PDF file and telling it to build some kind of interactive web map based on the information within it. Drumroll... ๐ฅ +
I had always wanted to build some kind of interactive map or network viz using this source, but this kind of coding isn't my strong suit so I gave up. The file just gathered dust on my computer for the past 12 years. +