kenmartinus.bsky.social
@kenmartinus.bsky.social
IA *users*
December 27, 2025 at 11:02 AM
Reposted
The news that AI was being trained on stolen works?

It was broken first about the BooksCorpus, on May 11 of 2021, and it was discussed in major news media thereafter. Romance authors talked about it. A lot.

The first archive they stole was largely romance.

arxiv.org/abs/2105.05241
Addressing "Documentation Debt" in Machine Learning Research: A Retrospective Datasheet for BookCorpus
Recent literature has underscored the importance of dataset documentation work for machine learning, and part of this work involves addressing "documentation debt" for datasets that have been used wid...
arxiv.org
December 24, 2025 at 2:54 AM