Ryan Dubnicek
rdubnicek.bsky.social
Ryan Dubnicek
@rdubnicek.bsky.social
he/him * Doing Digital Humanities/Cultural Analytics things, sometimes for the HathiTrust * @University of Illinois
Guessing that running BookNLP is part of the fun, but if you want to start w/output files, your friends at HTRC have run ~200k English-lang fiction vols through the pipeline already, and released all non-expressive data: htrc.atlassian.net/wiki/spaces/.... Unsure if any BSC vols are included though!
HTRC BookNLP Dataset for English-Language Fiction - Documentation - HathiTrust Research Center
htrc.atlassian.net
November 11, 2024 at 7:03 PM