Giulia Taurino
banner
giuliataurino.bsky.social
Giulia Taurino
@giuliataurino.bsky.social
AI+Cultural Heritage (OCR, CV, LLMs), Digital Humanities, Archival Science, Media & Cultural Studies.

Member of NULab & Turing Institute AI+Arts Group / Editor at The Programming Historian.
Pinned
I am delighted to share that I am a member of the Scientific Committee for the research project ∀ISION_E. The project's call for abstracts on "extended intelligences" in the field of drawing and architecture is open until November 14.

www.visioneuid.com/call-for-vis...
Call for Visions
Context & Mission of the Call for Visions
www.visioneuid.com
Reposted by Giulia Taurino
I curated some readings for class on "data tensions" and the list felt worth sharing. Come on a tour of datasets, books, the web, and AI with me...

We'll start with this piece on the Google Books project: the hopes, dreams, disasters, and aftermath of building a public library on the internet.

1/n
Torching the Modern-Day Library of Alexandria
“Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”
www.theatlantic.com
November 14, 2025 at 4:39 PM
I am delighted to share that I am a member of the Scientific Committee for the research project ∀ISION_E. The project's call for abstracts on "extended intelligences" in the field of drawing and architecture is open until November 14.

www.visioneuid.com/call-for-vis...
Call for Visions
Context & Mission of the Call for Visions
www.visioneuid.com
November 8, 2025 at 8:34 AM
Reposted by Giulia Taurino
Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by sunnytqin.bsky.social, me, and @dmelis.bsky.social illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation. #mlsky #nlp
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...
arxiv.org
December 20, 2024 at 5:56 PM
Reposted by Giulia Taurino
Join us in Charleston on November 4 for our preconference, "How Can Libraries and Publishers Collaborate to Make Backlist Monographs Open Access?, which is free to attend through the support of the California Digital Library, the De Gruyter eBound Foundation, and University of Michigan Library.
How Can Libraries and Publishers Collaborate to Make Backlist Monographs Open Access?
Join us in Charleston this November for a Preconference on making backlist monographs open access! Tuesday, November 4, 2025, 1pm-4pm ET Cost: $0 Presenters: Dave Hansen, Executive Director, Author…
www.authorsalliance.org
October 9, 2025 at 2:01 PM
Reposted by Giulia Taurino
In principle, open access means that anyone, anywhere, can read and reuse scholarly work. In practice, many works labeled as “open” are constrained by restrictions that limit how they can be used. These constraints dilute the value of openness and conflict with its foundational definitions.
Open? When Site Restrictions and Clauses Undermine Open Access
Open access publishing has transformed the way research circulates. In principle, open access means that anyone, anywhere, can read and reuse scholarly work without financial, legal, or technical b…
www.authorsalliance.org
October 29, 2025 at 1:24 PM
Reposted by Giulia Taurino
Computationally, whitespace gets little attention—it’s usually standardized or stripped.

But in poetry, whitespace matters!

Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.

New paper ⬜️ aclanthology.org/2025.emnlp-m...
November 3, 2025 at 3:14 PM
Reposted by Giulia Taurino
As DH grows, it’s increasingly important to publish conference papers, but there hasn’t been a clear venue for that.

So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.

Thanks to @taylor-arnold.bsky.social for leading this effort!

bit.ly/ach-anthology
October 29, 2025 at 3:39 PM
Reposted by Giulia Taurino
New issue of my newsletter: "The Index and the Vector" — Converting ambiguity into precision can help a broader audience discover and learn from collections newsletter.dancohen.org/archive/the-...
The Index and the Vector
Converting ambiguity into precision can help a broader audience discover and learn from collections
newsletter.dancohen.org
October 20, 2025 at 3:30 PM
Reposted by Giulia Taurino
New issue of my newsletter: “The Library’s New Entryway” — An interface that combines the advantages of the traditional index with the power of LLMs is the path forward newsletter.dancohen.org/archive/the-...
The Library’s New Entryway
An interface that combines the advantages of the traditional index with the power of LLMs is the path forward
newsletter.dancohen.org
October 10, 2025 at 7:32 PM
Reposted by Giulia Taurino
highly recommend!
October 6, 2025 at 6:29 PM
Reposted by Giulia Taurino
Bartz v. Anthropic has had a couple of major developments. Though the lawsuit was initially brought to address the legality of using copyrighted materials for training AI, the suit now focuses on Anthropic’s storage—without training use—of copies of books downloaded from LibGen and PiLiMi.
Bartz v. Anthropic: A Preliminary Look at What LibGen Books May Be Included in the Class Action
The LibGen Logo For this post, we relied heavily on the help of Charles Horn, self-described “metadata wrangler,” for data analysis.  As readers are likely aware, the Bartz v. Anthropic AI law…
www.authorsalliance.org
September 5, 2025 at 1:08 PM
Reposted by Giulia Taurino
Anthropic’s copyright settlement is historic, but it’s also not what many authors and publishers think. Check out our latest on what’s inside the proposed settlement:
The Anthropic Settlement – what it is and isn’t (and who could get paid)
www.anthropiccopyrightsettlement.com EDIT: On Sunday evening, Judge Alsup granted the motion for a hearing on Monday, September 8th, but expressed disappointment over lack of details, mostly on the…
www.authorsalliance.org
September 8, 2025 at 11:10 AM
Reposted by Giulia Taurino
I have updated my in-depth analysis of Bartz v Anthropic to reflect this important and overlooked aspect of the proposed settlement: “In what may be a rude surprise for authors, partial or full payments for many books may go to publishers rather than authors.” newsletter.dancohen.org/archive/land...
Will a Landmark AI Settlement Make Authors Feel Whole?
The remuneration from Bartz v. Anthropic may not provide what writers really want: respect, recognition, and readers
newsletter.dancohen.org
September 8, 2025 at 1:32 PM
Reposted by Giulia Taurino
With @yh-huang.bsky.social, I'm excited to share our Digital Collections Explorer, an open-source, multimodal viewer for digital collections! Users can search with both natural language inputs and reverse image search.

Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com
Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections
We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital col...
arxiv.org
July 2, 2025 at 8:56 PM
Reposted by Giulia Taurino
New issue of my newsletter: “AI and Libraries, Archives, and Museums, Loosely Coupled"—A new framework provides a way for cultural heritage institutions to take advantage of the tech with fewer misgivings, and to serve students, scholars, and the public better newsletter.dancohen.org/archive/ai-a...
AI and Libraries, Archives, and Museums, Loosely Coupled
A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better
newsletter.dancohen.org
August 18, 2025 at 9:06 PM
Reposted by Giulia Taurino
A new translation of @espejolento.bsky.social‬'s lesson!

doi.org/10.46430/phe...

We’re grateful to Javier Cisneros Brito + Alberto Santiago Martínez for their translation.

Thank you to @betovargas.github.io‬ + Marisol Andrade Muñoz for their reviews, and to @giuliataurino.bsky.social for editing.
July 9, 2025 at 2:21 PM
Reposted by Giulia Taurino
What does a "function" mean? What does it look like on a Wikimedia project? It might be something that checks leap years, tests for prime numbers, or decodes a cipher. These are small, clear examples that you can experiment with easily on Wikifunctions. 🧵⬇️ (1/3)
June 27, 2025 at 2:00 PM
Reposted by Giulia Taurino
Everyone’s dropping VLM-based OCR models lately…
But are they actually better than traditional OCR engines, which output XML for historical docs?

I built OCR Time Machine to test it!

📄 Upload image + ALTO/PAGE XML
⚖️ Compare outputs side by side
🔗 huggingface.co/spaces/davan...
June 24, 2025 at 5:35 PM