Library Innovation Lab
banner
harvardlil.bsky.social
Library Innovation Lab
@harvardlil.bsky.social
A crowd of coders, lawyers, librarians, designers, & tinkerers building tools like Perma.cc & Caselaw Access Project at the Harvard Law School Library. Where
@institutionaldatainitiative.org got started.

🌐 https://lil.law.harvard.edu
When might digital provenance matter? Could we imagine it being used to right past wrongs, to return objects to their rightful places, to restore justice?

Our Public Data Project's @mollyhardy.bsky.social reflects on copying gov data & principles of provenance

lil.law.harvard.edu/blog/2025/12...
Replication of Government Datasets and the Principles of Provenance | Library Innovation Lab
As part of our Public Data Project, LIL recently launched Data.gov Archive Search. In this post, we consider the importance of provenance for large, replicat...
lil.law.harvard.edu
December 15, 2025 at 12:07 PM
Reposted by Library Innovation Lab
If you'd like an informative and interesting conversation to enjoy try "Inside Harvard’s Data.gov Archive – A Conversation with Jack Cushman"

@jed.co interviewing Jack Cushman @harvardlil.bsky.social about Data.gov archiving on @source.coop

Now live www.youtube.com/watch?v=XYMb...
www.youtube.com
November 19, 2025 at 7:20 PM
Reposted by Library Innovation Lab
Our EOT2024 partner @harvardlil.bsky.social was interviewed by @jed.co on @source.coop about archiving government data.

Listen & learn how 300,000+ federal datasets are archived for posterity

Inside Harvard’s Data.gov Archive, A Conversation with Jack Cushman: www.youtube.com/watch?v=XYMb...
November 19, 2025 at 7:28 PM
Reposted by Library Innovation Lab
If you missed our conversation with Jack Cushman from @harvardlil.bsky.social‬, you can catch up now. We discussed the Data.gov Archive and the challenge of preventing federal data loss – it's about more than just web pages.

youtube.com/live/XYMbQru...
November 20, 2025 at 9:06 PM
Reposted by Library Innovation Lab
Workshop Report: "Resilience in Times of Crisis - Strengthening Open Science Against Geopolitical Pressures" (via Research Group Information Management" at Humboldt-Universität zu Berlin) infomgnt.org/posts/2025-1... @datarescueproject.org @harvardlil.bsky.social #libraries #openscience
November 22, 2025 at 6:26 PM
And join Jack live on @source.coop's Great Data Products podcast on 11/19 to hear how we're making cultural memory collections easier to access and harder to delete. greatdataproducts.com/housekeeping...
Upcoming Episode: Inside Harvard's data.gov Archive
How the Harvard Library Innovation Lab is preserving and making Data.gov datasets discoverable using BagIt and static search.
greatdataproducts.com
October 31, 2025 at 1:43 PM
Podcast: Jack Cushman joins “Pioneers & Pathfinders” to discuss libraries shaping legal tech, digital preservation, and realities of legal AI. Listen: www.seyfarth.com/news-insight...
Pioneers and Pathfinders: Jack Cushman
Today, we’re joined by Jack Cushman, director of the Harvard Library Innovation Lab, where he and his team are reimagining how library principles can shape the future of legal technology. Jack is a…
www.seyfarth.com
October 31, 2025 at 1:40 PM
This series is part of our work investigating not just the technical, but the human and societal routes to long-term digital preservation. You can also check out @maxy.bsky.social's Century-Scale Storage on our website.

lil.law.harvard.edu/century-scal...
Century-Scale Storage
If you had to store something for 100 years, how would you do it?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Frank Cifaldi who is the Founder and Director of the Video Game History Foundation:

"I would have the best copyright lawyers in the country figuring out how we can actually make this work."

lil.law.harvard.edu/generational...
Frank Cifaldi | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Rebecca Frank who is an Assistant Professor at the University of Michigan School of Information:

"The glib answer is the money itself would be the solution."

lil.law.harvard.edu/generational...
Rebecca Frank | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Amelia Acker who is an Associate Professor in the School of Communication & Information at Rutgers:

"I probably wouldn't build a system. I'd build a bureaucracy."

lil.law.harvard.edu/generational...
Amelia Acker | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
One answer from Che-Wei Wang and Taylor Levy, aka CW&T, who are designers, fabricators, and artists:

"Wait, only a hundred years?"

lil.law.harvard.edu/generational...
Che-Wei Wang & Taylor Levy | Library Innovation Lab
InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
LIL fellow @maxy.bsky.social asked 14 scholars, archivists, designers, business leaders & engineers: "If you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?"

Their answers:
lil.law.harvard.edu/generational...
Generational Data Interviews | Library Innovation Lab
14 Designs for Digital Preservation in 2025
lil.law.harvard.edu
October 15, 2025 at 2:50 PM
Our Public Data Project Lead @mollyhardy.bsky.social writes about public data preservation and how it’s complicated by the artificial contemporary distinction between science and the humanities @scholarlykitchen.bsky.social
Guest Post - Rethinking Disciplinary Data Regimes - The Scholarly Kitchen
Between a political policy environment focused on defunding and deleting data collections – an environment in which little can be trusted – and an onslaught of new AI tools that feed indiscriminately ...
scholarlykitchen.sspnet.org
October 8, 2025 at 2:57 PM
Join our team! LIL is looking for a Product and Research Manager to help create, shape, and execute on our portfolio of open knowledge projects. PRMs work across every piece of the LIL ecosystem, from software experimentation to convening of events. Learn more at careers.harvard.edu/job/product-...
Product and Research Manager
careers.harvard.edu
July 3, 2025 at 5:26 PM
Thrilled to share that @maxy.bsky.social's "Century-Scale Storage" was nominated for a Webby Award!

You can vote in the "Best Individual Editorial Feature" category here: vote.webbyawards.com/PublicVoting...
Vote for the best of the internet
I just voted in The Webby People's Voice Awards and checked my voter registration.
vote.webbyawards.com
April 1, 2025 at 8:59 PM
Reposted by Library Innovation Lab
As the @institutionaldatainitiative.org expands its mission, we’re announcing a collaboration with @bpl.boston.gov to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at the Boston Public Library. institutionaldatainitiative.org/posts/using-...
Using AI to Accelerate Digitization at Boston Public Librarys
Today, as part of our mission expansion, we’re announcing a collaboration with BPL to develop AI-driven tools capable of accelerating new digitization of large collections at libraries across the worl...
institutionaldatainitiative.org
March 12, 2025 at 1:23 PM
Reposted by Library Innovation Lab
I'm pleased to announce we're expanding our mission at the @institutionaldatainitiative.org with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work. institutionaldatainitiative.org/posts/open-c...
Expanding Our Mission: An Open Call for Collaborators
Today, we’re pleased to announce an open call for institutional collaborators as new support expands the research capacity of the Institutional Data Initiative.
institutionaldatainitiative.org
March 5, 2025 at 3:36 PM
Ed Summers at Stanford wrote this great deep dive of how and why we designed our data.gov archiver the way we did. Thanks for digging in, Ed, this is excellent. inkdroid.org/2025/02/17/n...
Bagging data.gov
inkdroid.org
February 19, 2025 at 7:16 PM
We just launched a 16TB archive of every dataset that has been available on data.gov since November. This will be updated day by day as new datasets appear. It can be freely copied, and we're sharing the code behind it to help others make their own archives of data they depend on.
Announcing the Data.gov Archive | Library Innovation Lab
Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complet...
lil.law.harvard.edu
February 6, 2025 at 9:23 PM
Reposted by Library Innovation Lab
Penn is getting a lot of questions about Data Refuge. That effort no longer exists, but several efforts are currently active. I've created a doc from what I & others have suggested. I'll update as I hear more. Feel free to share or suggest: docs.google.com/document/d/1...
Data Rescue Efforts
Data / Website Rescue Efforts End of Term Crawl - The main coordinated effort to archive websites, but datasets have been more of a challenge. EDGI - They have been focused on environmental data. A ...
docs.google.com
February 3, 2025 at 4:14 PM
What are we all missing? Anything you can't get by clicking from link to link like EOT, or downloading datasets directly from data.gov. If there's things you care about preserving that fit that description, that's where to focus.
January 31, 2025 at 9:11 PM
Another limited but vital effort is @eotarchive.org. They collect a huge amount from .gov domains every four years, and make it discoverable through @archive.org
End of Term Web Archive
The End of Term Web Archive is a collaborative initiative that collects, preserves, and makes accessible United States Government websites at the end of presidential administrations.
eotarchive.org
January 31, 2025 at 9:09 PM
Our collection from data.gov is limited: if an entry points directly to the data, such as a csv, we have the data. If it points to an html landing page, we just have the landing page. This means many, many datasets are not included. What we have from data.gov adds up to 15 or 20TB.
Data.gov Home - Data.gov
data.gov
January 31, 2025 at 9:07 PM
Speaking of telling someone, here’s our update: we have copies of all metadata from data.gov, and all of the dataset URLs it points to (shallow crawl); all federal Github repositories with issues, comments, etc.; and articles from PubMed.
Data.gov Home - Data.gov
data.gov
January 31, 2025 at 9:06 PM