This is very welcome news in the face of Trumpist depredations of US public data.
https://lil.law.harvard.edu/blog/2025/01/30/preserving-public-u-s-federal-data/
"In recent months the Harvard Law School [@harvard_law] Library Innovation Lab [@harvardlil] has created a data vault to download, sign as authentic, and make available copies of public government data that is most valuable to researchers, scholars, civil society and the public at large across every field. To begin, we have collected major portions of the datasets tracked by data.gov, federal #Github repositories, and #PubMed...."
Related: Also see the End of Term Web Archive, which routinely scrapes and preserves US govt web pages before new presidents take office. Launched in 2008 and still going strong. To supplement its archive, it welcomes URL nominations of sites to save.
https://eotarchive.org/
Related: "As Data Goes Off-Line Under #Trump, Environmental Researchers Are Uploading Backups"
https://www.insidehighered.com/news/faculty-issues/research/2025/01/29/data-goes-line-under-trump-researchers-upload-backups
"In the first few days of Donald Trump’s second term as president, the White House Council on Environmental Quality’s #Climate and Economic Justice Screening Tool (#CEJST, for short) disappeared from government websites. It was an interactive map of U.S. Census tracts that are “marginalized by underinvestment and overburdened by pollution,” as the pre-Trump federal government put it—something researchers and the public could use to quickly locate and zoom in on specific communities and analyze the problems they face. The Internet Archive’s Wayback Machine stored a copy of the webpage, but even there, the map was gone. However, thanks to a team of researchers from multiple universities and other organizations, a new working version was posted online Friday."
Related: "The #Trump administration is scrubbing the #CDC’s website of documents on #ReproductiveRights issues, sexual health, intimate partner violence, and more. We’re saving them. 𝘈𝘣𝘰𝘳𝘵𝘪𝘰𝘯, 𝘌𝘷𝘦𝘳𝘺 𝘋𝘢𝘺 will publish and host these vital documents for as long as necessary. To share deleted documents with 𝘈𝘣𝘰𝘳𝘵𝘪𝘰𝘯, 𝘌𝘷𝘦𝘳𝘺 𝘋𝘢𝘺, email tips@abortioneveryday.com. "
https://jessica.substack.com/p/cdc-birth-control-guidelines-pdf
Related. Also see the Safeguarding Research project.
https://safeguarding-research.discourse.group/
"You know of any publicly available material that needs safeguarding? Please post about it here!"
Related. "Researchers rush to preserve federal health databases before they disappear from government websites"
https://journalistsresource.org/home/researchers-rush-to-preserve-federal-health-databases-before-they-disappear-from-government-websites/
"Tips for preserving websites:
* To find the missing websites, go to Wayback Machine and type in the website’s URL in the search bar.
* If you’re concerned that certain websites or web pages may be removed, you can suggest federal websites and content that end in .gov, .mil and .com to the End of Term Web Archive.
* You can suggest federal climate and environmental databases to Environmental Data and Governance Initiative.
* You can suggest databases to The Data Liberation Project, which is run by MuckRock and Big Local News.
* Tell science journalist Maggie Koerth what CDC data you've downloaded and whether you've made them publicly available...."
Related. "Scientists globally are racing to save vital health databases taken down amid Trump chaos."
https://www.nature.com/articles/d41586-025-00374-y
Update. "Inside the race to archive the US government’s websites"
https://www.technologyreview.com/2025/02/07/1111328/inside-the-race-to-archive-the-us-governments-websites/
Surveying a range of initiatives with good clarity on the obstacles.
"There are questions about whether scraping the data will really be enough. Restoring websites and complex data sets is often not a simple process.…'The repairs and attempts to recover are sometimes insurmountable where we need continuous readings of data.' 'All of this data archiving work is a temporary Band-Aid,' says Gosnell. 'If data sets are removed and are no longer updated, our archived data will become increasingly stale and thus ineffective at informing decisions over time.' "
Update. "The Public Environmental Data Partners [#PEDP] are committed to preserving and providing public access to federal environmental data. We are a volunteer coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work. To gather insights on what data to preserve, we reached out to our networks, which consist largely of environmental justice groups and networks, state and local government climate offices, and academic researchers. We compiled a large list of federal databases and tools, and prioritized them based on their relative impact, our confidence that we could archive them, and the relative effort it would take to obtain and archive them."
https://screening-tools.com/
Continuously updated.
Update. "This is Version 2 of the Climate and Economic Justice Screening Tool, released by the Council on Environmental Quality [#CEQ] in December 2024. Although the tool remains unchanged, public access through the White House was discontinued on January 22, 2025. We re-created Version 2 and made it publicly accessible."
https://screening-tools.com/climate-economic-justice-screening-tool
Update. "Today we [Harvard Law School @harvard_law Library Innovation Lab @harvardlil] released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov. This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use."
https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/
Update. "Federal data is disappearing. On Thursday, meet the teams working to rescue it and learn how you can help. Join the Internet Archive [@internetarchive] and the Library Innovation Lab [@harvardlil] on Feb. 13, 3pm Eastern for a special event exploring the terabytes of data they have already saved and how to access it."
https://www.muckrock.com/news/archives/2025/feb/10/federal-data-is-disappearing-on-thursday-meet-the-teams-working-to-rescue-it-and-learn-how-you-can-help/
Update. If you're following this thread, you should also follow the Data Rescue Project by visiting its web site and subscribing to its email list. It aims "to serve as a clearinghouse for #data rescue-related efforts and data access points for public US governmental data that are currently at risk." And it's #crowdsourced, which gives it fighting chance to be comprehensive and up to date.
https://www.datarescueproject.org/about-data-rescue-project/
If you're on #Bluesky, also follow its B account.
https://bsky.app/profile/datarescueproject.org
I'm very aware that a solo effort, like this Mastodon thread, doesn't scale to the size of this task and I welcome the arrival of a crowdsourced effort. I will use it and refer people to it.
Update. "As the US government removes health websites and data, here’s a list of non-government data alternatives and archives"
https://journalistsresource.org/home/as-the-us-government-removes-health-websites-and-data-heres-a-list-of-non-government-data-alternatives/
"There’s no perfect alternative to the government databases, but some non-governmental organizations have their own datasets, which can be useful to journalists. Several #journalism associations have also been downloading government data and making them available to their members. To help journalists with their continued reporting, we have curated a list of non-government websites that have health data, although some use government data to create their reports. We’ll continue to update this list. If you have a suggestion for a database, please email us."
h/t @kdnyhan
Update. "Here’s why and how Public Environmental Data Partners [#PEDP] and others are making sure that the #climate science the public depends on is available forever."
https://theconversation.com/how-to-find-climate-data-and-science-the-trump-administration-doesnt-want-you-to-see-249321
Update. "While we at the #PEGI Project [Preservation of Electronic Government Information] have been aware of the potential for a crisis like this since the start of our project in 2017, both the pace and extent of the removals and changes have been astonishing to witness. What has also been astonishing (and heartening!) is the willingness of a broad community to join together in quick action to save content, particularly data that cannot be easily captured as part of the End of Term Archive. The Public Environmental Data Partners, a project launched by the Environmental Data Governance Initiative (EDGI), has been working on collecting and preserving hard-to-crawl environmental data for the past couple of months. In the past two weeks, a coalition has formed to launch the Data Rescue Project, which then debuted its Data Rescue Tracker. They also have a helpful (and well-vetted!) list of Resources that can guide individuals and organizations wanting to contribute to this work."
https://www.pegiproject.org/blog/2025/2/14/pegi-project-urges-preservation-of-public-federal-data
Update. "A Renewed Call for Preservation of At-Risk Government Data"
https://www.icpsr.umich.edu/web/about/cms/6103
"The directors of the University of Michigan's Institute for Social Research (#ISR) and the Inter-university Consortium for Political and Social Research (#ICPSR) are emphasizing the critical need for preserving government data that may be at risk due to recent policy shifts....Through #DataLumos, an ICPSR archive for valuable government data resources, ICPSR is helping the data community to preserve, document, and disseminate thousands of files from agencies such as the Centers for Disease Control [#CDC] and the Department of Education [#DOE]."
Update. The Data Rescue Project spotlights 13 college and university #libguides on projects to rescue scientific and govt data from #censorship or deletion.
https://www.datarescueproject.org/libraries-supporting-data-rescue/
Update. "The cost of losing government webpages and public data"
https://www.marketplace.org/shows/marketplace-tech/the-cost-of-losing-government-webpages-and-public-data/
"Jack Cushman, director of the Harvard Library Innovation Lab [@harvardlil], has been preserving sites and data that went dark after executive orders from President #Trump. He underlines the importance of keeping digital copies or risk parting with “our cultural memory.”
Update. "Federal data is disappearing. Meet the teams working to rescue it and learn how you can help."
https://www.youtube.com/watch?v=hiZuKA-o4V4
"Since the start of the new #Trump administration, hundreds of federal data sets and government websites have gone offline without warning, sometimes returning with major changes and sometimes not returning at all. On February 13th, #MuckRock hosted an event with organizations that are helping lead the efforts to preserve the public’s data."
This is a video of the event.
Update. "The fight to preserve federal government data."
https://www.muckrock.com/news/archives/2025/feb/04/the-fight-to-preserve-federal-government-data/
"Despite the Trump administration restricting access to these government sites and its underlying data, organizations and individuals have been working to preserve this data. Here are just a few of the efforts that part of an ongoing effort to preserve public information."
Update. "How the Wayback Machine is preserving outdated government websites."
https://www.cbsnews.com/video/how-the-wayback-machine-is-preserving-outdated-government-websites/
"The #WaybackMachine is helping preserve the record of government websites before they were changed by the TTrump administration. CBS News Confirmed's Rhona Tarrant reports."
@petersuber Apparently it is standard practice for them to do this with every administration change. Just this one is so mission critical.