Ben Brumfield
benwbrum.bsky.social
Ben Brumfield
@benwbrum.bsky.social
Open Source #DigitalHumanities software engineer.
Founder of FromThePage.com, a platform for collaborative #manuscript #transcription to engage the public in #archives and create digital scholarly editions.
And these interactions are very costly. AI-authored PRs averaged three times the interventions that are needed when we work with PRs created by our developer Will. (Will is amazing!) And our current AI agents don't seem to be able to actually run our test suite, which we may be able to fix.
September 15, 2025 at 3:46 PM
The quantity of AI work also doesn't indicate quality. Fully a third of the PRs opened by Copilot Agent or Codex were so bad that we abandoned them rather than trying to fix them -- often after several interactions with the AI agent.
September 15, 2025 at 3:42 PM
The bad news is that while our issue backlog dropped, the backlog of pull requests needing review, test and approval skyrocketed. Sara and I started reviewing AI PRs instead of more important PRs.

Some of the issue backlog wasn't due to AI at all, but simple grooming (closing duplicates, etc.)
September 15, 2025 at 3:40 PM
At the end of July, we tried an experiment going through our old issue backlog. Low-stakes bug fixes and enhancements seemed ideal for turning over to an AI. The good news was that we made a lot of progress on our backlog.
September 15, 2025 at 3:37 PM
We did a serious analysis of our experiments adding AI coding agents (Codex, Github Copilot Agent) to our development process at FromThePage for a group of friends in software yesterday, which I thought I'd share here as well. After a couple of months of experiments, the results are very mixed.
September 15, 2025 at 3:34 PM
How would you interpret this envelope for a 1603 Spanish document? It looks like two different hands to me.

Visible words include "Cartaxena" and "El governador don Gieronimo"
January 6, 2025 at 3:14 PM
I was just able to run HTR from Caracal and compare the results with the transcription done by CWRGM staff in FromThePage: gist.github.com/benwbrum/5c9...

Green indicates changes that needed to be made to correct the text to the CWRGM ground truth.
January 3, 2025 at 4:00 PM
This is very impressive! I'm comparing it to ground truth from a scholarly edition (fromthepage.com/cwrgm/dhag-s...) and trying to analyze the results.

(Image source is at cdm17313.contentdm.oclc.org/iiif/2/mdah:...; Mark-up at fromthepage.com/export/legal...)
January 3, 2025 at 2:45 PM
How common were dual-use water mills in 19th-century Virginia? I'm marveling over this account of "Cotton & Logs sent to Doct Miller's mill". #skystorians

fromthepage.com/benwbrum/jer...
December 17, 2024 at 2:36 PM
Compare to the 0.0046 CER in By The People transcriptions at Library of Congress. (N.B. I believe that these were the CERs for the consensus volunteer corrections, so this is apples-to-oranges)

Source: Van Hyning, @algeebraten.bsky.social, et al. openhumanitiesdata.metajnl.com/articles/10....
December 4, 2024 at 4:11 PM
Does other literature exist measuring accuracy of #crowdsourcing MSS #transcription? The Olivera and Kaplan research used humans hired via a paid service, and the results were very different from what I'd expect: CER of 0.10 to 0.13!
December 4, 2024 at 2:57 PM
Testing a bug fix between meetings. This might be a good time to show off the slider feature we built into FromThePage's transcription screen. (It's nicer on a big monitor, but even on my laptop, adjusting the image view vs. transcription editor makes a difference.) #DayOfDH2024
December 2, 2024 at 3:31 PM
This #DayOfDH2024 is on Monday, when we have our weekly planning meeting for FromThePage. We gather stats for our scorecard, so I've filled in the product side. (Interestingly, the number of pages transcribed last week fell by a third -- was it the US holiday or did a big project finish?)
December 2, 2024 at 12:59 PM
Finished my overview of open pull requests for the team. Sadly, most of the open issues are blocked by me, and I'll be in meetings most of the #dayofdh2024
December 2, 2024 at 12:45 PM
Found it!
November 24, 2024 at 3:01 PM
I need help deciphering this passage in Jeremiah White Graves's 1852 financial accounts with "my Henry" -- Henry Washington, an enslaved adult laborer. For context, an earlier entry records "To cash lent you in Lynchg for Traces"

"Trace chair"? "Trace chane"? ???
November 24, 2024 at 2:09 PM
On Dec 12, join us for a webinar with Lucian Li, where he'll showcase a new method for identifying intellectual influences in historical texts using sentence embeddings.
The webinar is at 12PM EST, 11AM CST, and 9AM PST.
Sign up here: content.fromthepage.com/dec-2024-web...
November 19, 2024 at 2:22 PM
Trying the same document in #transkribus (via transkribus.ai) results in a less readable result that still won't match "Dunmore", "Slaves", or "Emancipate". But at least that #htr text appears tentative, with obvious errors, rather than the deceptively authoritative looking text from the LLM
May 15, 2024 at 11:11 AM
In particular, it was not able to read Lord Dunmore's name, nor read either of two references to "Emancipate our Slaves". If you've been following historical debates about the #1619project , you can imagine how problematic it would be to rely on this transcription for full-text search.
May 15, 2024 at 11:01 AM