Lightnews — Scholar-powered news

Reposted by Giulia Taurino

Maria Antoniak

@mariaa.bsky.social

I curated some readings for class on "data tensions" and the list felt worth sharing. Come on a tour of datasets, books, the web, and AI with me...

We'll start with this piece on the Google Books project: the hopes, dreams, disasters, and aftermath of building a public library on the internet.

1/n

Torching the Modern-Day Library of Alexandria

“Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”

www.theatlantic.com

November 14, 2025 at 4:39 PM

Giulia Taurino

@giuliataurino.bsky.social

cesta.stanford.edu/events/whos-...

Who’s Afraid of Virginia Woolf?: Absent Presence and Digital Remediation in Woolf’s Bookselling Archive | Center for Spatial and Textual Analysis

In 1917, Virginia Woolf co-founded her own private Press, the Hogarth Press, on the dining table of her home, Hogarth House, in southwest London. While Virginia and her husband, Leonard, initially int...

cesta.stanford.edu

November 15, 2025 at 7:30 AM

Giulia Taurino

@giuliataurino.bsky.social

arxiv.org/abs/2503.15195

Benchmarking Large Language Models for Handwritten Text Recognition

Traditional machine learning models for Handwritten Text Recognition (HTR) rely on supervised training, requiring extensive manual annotations, and often produce errors due to the separation between l...

arxiv.org

November 13, 2025 at 8:15 PM

Giulia Taurino

@giuliataurino.bsky.social

I am delighted to share that I am a member of the Scientific Committee for the research project ∀ISION_E. The project's call for abstracts on "extended intelligences" in the field of drawing and architecture is open until November 14.

www.visioneuid.com/call-for-vis...

Call for Visions

Context & Mission of the Call for Visions

www.visioneuid.com

November 8, 2025 at 8:34 AM

Giulia Taurino

@giuliataurino.bsky.social

www.newyorker.com/magazine/202...

How Leonora Carrington Feminized Surrealism

Each time the work of the British-Mexican artist and writer is reborn, it seems more prescient.

www.newyorker.com

November 8, 2025 at 4:51 AM

Reposted by Giulia Taurino

Naomi Saphra

@nsaphra.bsky.social

Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by sunnytqin.bsky.social, me, and @dmelis.bsky.social illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation. #mlsky #nlp

Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization

Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...

arxiv.org

December 20, 2024 at 5:56 PM

Reposted by Giulia Taurino

Authors Alliance

@authorsalliance.bsky.social

Join us in Charleston on November 4 for our preconference, "How Can Libraries and Publishers Collaborate to Make Backlist Monographs Open Access?, which is free to attend through the support of the California Digital Library, the De Gruyter eBound Foundation, and University of Michigan Library.

How Can Libraries and Publishers Collaborate to Make Backlist Monographs Open Access?

Join us in Charleston this November for a Preconference on making backlist monographs open access! Tuesday, November 4, 2025, 1pm-4pm ET Cost: $0 Presenters: Dave Hansen, Executive Director, Author…

www.authorsalliance.org

October 9, 2025 at 2:01 PM

Reposted by Giulia Taurino

Authors Alliance

@authorsalliance.bsky.social

In principle, open access means that anyone, anywhere, can read and reuse scholarly work. In practice, many works labeled as “open” are constrained by restrictions that limit how they can be used. These constraints dilute the value of openness and conflict with its foundational definitions.

Open? When Site Restrictions and Clauses Undermine Open Access

Open access publishing has transformed the way research circulates. In principle, open access means that anyone, anywhere, can read and reuse scholarly work without financial, legal, or technical b…

www.authorsalliance.org

October 29, 2025 at 1:24 PM

Reposted by Giulia Taurino

Melanie Walsh

@mellymeldubs.bsky.social

Computationally, whitespace gets little attention—it’s usually standardized or stripped.

But in poetry, whitespace matters!

Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.

New paper ⬜️ aclanthology.org/2025.emnlp-m...

November 3, 2025 at 3:14 PM

Reposted by Giulia Taurino

Melanie Walsh

@mellymeldubs.bsky.social

As DH grows, it’s increasingly important to publish conference papers, but there hasn’t been a clear venue for that.

So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.

Thanks to @taylor-arnold.bsky.social for leading this effort!

bit.ly/ach-anthology

Screenshot that reads:

Introducing the Anthology for Computers and the Humanities

Taylor Arnold, Maria Antoniak, Miguel Escobar Varela, Marie Puren, Mila Oiva , Amanda Regan, Lauren Tilton, and Melanie Walsh

1 Data Science and Statistics, University of Richmond, U.S.A.
2 Computer Science, University of Colorado Boulder, U.S.A.
3 Faculty of Arts and Social Sciences, National University of Singapore
4 Laboratoire de Recherche de l'EPITA, Paris, France
5 History and Archaeology, University of Turku, Finland
6 History and Geography, Clemson University, U.S.A.
7 Rhetoric and Communication Studies, University of Richmond, U.S.A.
8 Information School, University of Washington, U.S.A.

Permanent Link: https://doi.org/10.63744/HHsQG7hNWyxG

Published: 25 September 2025

October 29, 2025 at 3:39 PM

Reposted by Giulia Taurino

Dan Cohen

@dancohen.org

New issue of my newsletter: "The Index and the Vector" — Converting ambiguity into precision can help a broader audience discover and learn from collections newsletter.dancohen.org/archive/the-...

The Index and the Vector

Converting ambiguity into precision can help a broader audience discover and learn from collections

newsletter.dancohen.org

October 20, 2025 at 3:30 PM

Reposted by Giulia Taurino

Dan Cohen

@dancohen.org

New issue of my newsletter: “The Library’s New Entryway” — An interface that combines the advantages of the traditional index with the power of LLMs is the path forward newsletter.dancohen.org/archive/the-...

The Library’s New Entryway

An interface that combines the advantages of the traditional index with the power of LLMs is the path forward

newsletter.dancohen.org

October 10, 2025 at 7:32 PM

Reposted by Giulia Taurino

Maria Antoniak

@mariaa.bsky.social

highly recommend!

Lori Emerson @loriemerson.net · Oct 6

if you'd like to hear my #othernetworks talk and/or gather virtually with likeminded people who are part of @metagov.bsky.social, please join us Wednesday Oct. 8th 10am MDT! luma.com/4kfjgd6a

Seminar: Other Networks with Dr. Lori Emerson · Zoom · Luma

For this seminar, we will welcome Dr. Lori Emerson (from the Media Studies Department at University of Colorado Boulder) to give a talk about some of her…

luma.com

October 6, 2025 at 6:29 PM

Giulia Taurino

@giuliataurino.bsky.social

dl.acm.org/doi/10.1145/...

Not Every AI Problem Is a Data Problem | Communications of the ACM

dl.acm.org

September 26, 2025 at 6:04 PM

Giulia Taurino

@giuliataurino.bsky.social

jeffreyschnapp.com/2025/09/26/c...

Calvino and the I in the Computer | Jeffrey Schnapp

Calvino and the I in the Computer | Jeffrey Schnapp | webpage // blog // log

jeffreyschnapp.com

September 26, 2025 at 3:48 PM

Giulia Taurino

@giuliataurino.bsky.social

cacm.acm.org/news/data-qu...

Data Quality May Be All You Need – Communications of the ACM

cacm.acm.org

September 15, 2025 at 5:00 PM

Giulia Taurino

@giuliataurino.bsky.social

ethz.ch/en/news-and-...

Apertus: a fully open, transparent, multilingual language model

EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus 2 September, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for...

ethz.ch

September 15, 2025 at 4:59 PM

Reposted by Giulia Taurino

Authors Alliance

@authorsalliance.bsky.social

Bartz v. Anthropic has had a couple of major developments. Though the lawsuit was initially brought to address the legality of using copyrighted materials for training AI, the suit now focuses on Anthropic’s storage—without training use—of copies of books downloaded from LibGen and PiLiMi.

Bartz v. Anthropic: A Preliminary Look at What LibGen Books May Be Included in the Class Action

The LibGen Logo For this post, we relied heavily on the help of Charles Horn, self-described “metadata wrangler,” for data analysis. As readers are likely aware, the Bartz v. Anthropic AI law…

www.authorsalliance.org

September 5, 2025 at 1:08 PM

Reposted by Giulia Taurino

Authors Alliance

@authorsalliance.bsky.social

Anthropic’s copyright settlement is historic, but it’s also not what many authors and publishers think. Check out our latest on what’s inside the proposed settlement:

The Anthropic Settlement – what it is and isn’t (and who could get paid)

www.anthropiccopyrightsettlement.com EDIT: On Sunday evening, Judge Alsup granted the motion for a hearing on Monday, September 8th, but expressed disappointment over lack of details, mostly on the…

www.authorsalliance.org

September 8, 2025 at 11:10 AM

Reposted by Giulia Taurino

Dan Cohen

@dancohen.org

I have updated my in-depth analysis of Bartz v Anthropic to reflect this important and overlooked aspect of the proposed settlement: “In what may be a rude surprise for authors, partial or full payments for many books may go to publishers rather than authors.” newsletter.dancohen.org/archive/land...

Will a Landmark AI Settlement Make Authors Feel Whole?

The remuneration from Bartz v. Anthropic may not provide what writers really want: respect, recognition, and readers

newsletter.dancohen.org

September 8, 2025 at 1:32 PM

Reposted by Giulia Taurino

Ben Lee

@bcgl.bsky.social

With @yh-huang.bsky.social, I'm excited to share our Digital Collections Explorer, an open-source, multimodal viewer for digital collections! Users can search with both natural language inputs and reverse image search.

Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com

Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections

We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital col...

arxiv.org

July 2, 2025 at 8:56 PM

Reposted by Giulia Taurino

Dan Cohen

@dancohen.org

New issue of my newsletter: “AI and Libraries, Archives, and Museums, Loosely Coupled"—A new framework provides a way for cultural heritage institutions to take advantage of the tech with fewer misgivings, and to serve students, scholars, and the public better newsletter.dancohen.org/archive/ai-a...

AI and Libraries, Archives, and Museums, Loosely Coupled

A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better

newsletter.dancohen.org

August 18, 2025 at 9:06 PM

Reposted by Giulia Taurino

Programming Historian

@proghist.bsky.social

A new translation of @espejolento.bsky.social‬'s lesson!

doi.org/10.46430/phe...

We’re grateful to Javier Cisneros Brito + Alberto Santiago Martínez for their translation.

Thank you to @betovargas.github.io‬ + Marisol Andrade Muñoz for their reviews, and to @giuliataurino.bsky.social for editing.

July 9, 2025 at 2:21 PM

Reposted by Giulia Taurino

Wikimedia Foundation

@wikimediafoundation.org

What does a "function" mean? What does it look like on a Wikimedia project? It might be something that checks leap years, tests for prime numbers, or decodes a cipher. These are small, clear examples that you can experiment with easily on Wikifunctions. 🧵⬇️ (1/3)

A grid of eight bold black icons, some mathematical and programming-related, arranged in a 3×3 layout on a white background, with the center-left icon – a bold X inside a red circle – standing out in color. Text says: Wikifunctions lets you explore programming logic, language, and math – without writing a line of code. These simple, introductory functions can get you started.

June 27, 2025 at 2:00 PM

Reposted by Giulia Taurino

Daniel van Strien

@danielvanstrien.bsky.social

Everyone’s dropping VLM-based OCR models lately…
But are they actually better than traditional OCR engines, which output XML for historical docs?

I built OCR Time Machine to test it!

📄 Upload image + ALTO/PAGE XML
⚖️ Compare outputs side by side
🔗 huggingface.co/spaces/davan...

Screenshot showing a document page image on the left with corresponding OCR output on the right of the page.

June 24, 2025 at 5:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news