Library Innovation Lab
banner
harvardlil.bsky.social
Library Innovation Lab
@harvardlil.bsky.social
A crowd of coders, lawyers, librarians, designers, & tinkerers building tools like Perma.cc & Caselaw Access Project at the Harvard Law School Library. Where
@institutionaldatainitiative.org got started.

🌐 https://lil.law.harvard.edu
And join Jack live on @source.coop's Great Data Products podcast on 11/19 to hear how we're making cultural memory collections easier to access and harder to delete. greatdataproducts.com/housekeeping...
Upcoming Episode: Inside Harvard's data.gov Archive
How the Harvard Library Innovation Lab is preserving and making Data.gov datasets discoverable using BagIt and static search.
greatdataproducts.com
October 31, 2025 at 1:43 PM
This series is part of our work investigating not just the technical, but the human and societal routes to long-term digital preservation. You can also check out @maxy.bsky.social's Century-Scale Storage on our website.

lil.law.harvard.edu/century-scal...
Century-Scale Storage
If you had to store something for 100 years, how would you do it?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Frank Cifaldi who is the Founder and Director of the Video Game History Foundation:

"I would have the best copyright lawyers in the country figuring out how we can actually make this work."

lil.law.harvard.edu/generational...
Frank Cifaldi | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Rebecca Frank who is an Assistant Professor at the University of Michigan School of Information:

"The glib answer is the money itself would be the solution."

lil.law.harvard.edu/generational...
Rebecca Frank | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Amelia Acker who is an Associate Professor in the School of Communication & Information at Rutgers:

"I probably wouldn't build a system. I'd build a bureaucracy."

lil.law.harvard.edu/generational...
Amelia Acker | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Che-Wei Wang and Taylor Levy, aka CW&T, who are designers, fabricators, and artists:

"Wait, only a hundred years?"

lil.law.harvard.edu/generational...
Che-Wei Wang & Taylor Levy | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
What are we all missing? Anything you can't get by clicking from link to link like EOT, or downloading datasets directly from data.gov. If there's things you care about preserving that fit that description, that's where to focus.
January 31, 2025 at 9:11 PM
Another limited but vital effort is @eotarchive.org. They collect a huge amount from .gov domains every four years, and make it discoverable through @archive.org
End of Term Web Archive
The End of Term Web Archive is a collaborative initiative that collects, preserves, and makes accessible United States Government websites at the end of presidential administrations.
eotarchive.org
January 31, 2025 at 9:09 PM
Our collection from data.gov is limited: if an entry points directly to the data, such as a csv, we have the data. If it points to an html landing page, we just have the landing page. This means many, many datasets are not included. What we have from data.gov adds up to 15 or 20TB.
Data.gov Home - Data.gov
data.gov
January 31, 2025 at 9:07 PM
Speaking of telling someone, here’s our update: we have copies of all metadata from data.gov, and all of the dataset URLs it points to (shallow crawl); all federal Github repositories with issues, comments, etc.; and articles from PubMed.
Data.gov Home - Data.gov
data.gov
January 31, 2025 at 9:06 PM
Third, tell someone. Archive.org is one good place to store public data for discovery, and we at LIL will consider storing and signing data in some cases as well. Just posting data somewhere search engines can find is good too.
Internet Archive: Digital Library of Free & Borrowable Texts, Movies, Music & Wayback Machine
Archive.org
January 31, 2025 at 9:05 PM
FOIA requests are another great way to scale up — check out @muckrock.com to get started.
January 31, 2025 at 9:04 PM
Next, scale up. If you’re a programmer (or can team up with one), write a python script to download a full collection — say, everything from the data portal of a given government website. Run it yourself, and share it so we libraries can use it too.
January 31, 2025 at 9:03 PM
If you’re a data scientist, good news — your work isn't just downloading data and publishing about it, but also keeping safe copies!
January 31, 2025 at 9:03 PM
To keep access to stuff you care about: first just make a copy. Use ArchiveWeb.page to click around and download all the parts of a website you’re interested in. We like the desktop version to avoid capturing login cookies or extensions, but the browser extension is good too.
ArchiveWeb.page
ArchiveWeb.page
January 31, 2025 at 9:02 PM