Member of NULab & Turing Institute AI+Arts Group / Editor at The Programming Historian.
www.visioneuid.com/call-for-vis...
We'll start with this piece on the Google Books project: the hopes, dreams, disasters, and aftermath of building a public library on the internet.
1/n
We'll start with this piece on the Google Books project: the hopes, dreams, disasters, and aftermath of building a public library on the internet.
1/n
www.visioneuid.com/call-for-vis...
www.visioneuid.com/call-for-vis...
But in poetry, whitespace matters!
Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.
New paper ⬜️ aclanthology.org/2025.emnlp-m...
But in poetry, whitespace matters!
Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.
New paper ⬜️ aclanthology.org/2025.emnlp-m...
So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.
Thanks to @taylor-arnold.bsky.social for leading this effort!
bit.ly/ach-anthology
So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.
Thanks to @taylor-arnold.bsky.social for leading this effort!
bit.ly/ach-anthology
Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com
Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com
doi.org/10.46430/phe...
We’re grateful to Javier Cisneros Brito + Alberto Santiago Martínez for their translation.
Thank you to @betovargas.github.io + Marisol Andrade Muñoz for their reviews, and to @giuliataurino.bsky.social for editing.
doi.org/10.46430/phe...
We’re grateful to Javier Cisneros Brito + Alberto Santiago Martínez for their translation.
Thank you to @betovargas.github.io + Marisol Andrade Muñoz for their reviews, and to @giuliataurino.bsky.social for editing.
But are they actually better than traditional OCR engines, which output XML for historical docs?
I built OCR Time Machine to test it!
📄 Upload image + ALTO/PAGE XML
⚖️ Compare outputs side by side
🔗 huggingface.co/spaces/davan...
But are they actually better than traditional OCR engines, which output XML for historical docs?
I built OCR Time Machine to test it!
📄 Upload image + ALTO/PAGE XML
⚖️ Compare outputs side by side
🔗 huggingface.co/spaces/davan...