With more and more scholarly papers being published, researchers are increasingly citing datasets. Therefore, preserving datasets for the long term is important as these datasets need to continue to be accessible after citation and are crucial for reproducibility. However, there is a significant problem with preserving research data over time because research data repositories can suddenly shut down unexpectedly. Attwood et al. (2015) showed 75% of biological databases were either closed entirely or outdated within 18 years. In the paper "Disappearing repositories – taking an infrastructure perspective on the long-term availability of research data", published in PLOS Biology in 2024, Strecker et al. take a broad look at data preservation by examining closed repositories. The authors investigated 191 closed research data repositories out of a registry, revealing a closure rate of 6.2%. The reasons for closure vary, and on average, closed repositories have been operating for 12 years. Efforts are made to save the data when repositories close, with 44% migrating their data to other repositories and 12% offering some form of continued access. This research highlights the challenges of maintaining research data availability in the long term, as well as proposing preventive measures to mitigate data loss when repositories are shut down. Background
Data repositories play a crucial role in scientific research by enabling data preservation, curation, and dissemination (Boyd, 2021; Johnston et al., 2018). They facilitate data sharing and collaboration among researchers. Maintaining these repositories over time is difficult (Thomer et al., 2018). Funding limitations, legal issues, and technical obsolescence can threaten their existence. Additionally, ensuring compatibility between evolving data formats and repository infrastructure is a challenge.
Despite efforts to maintain operations, some repositories might shut down (Dean, 2016; Aturban et al., 2021). This can lead to permanent data loss, impacting research progress. Ideally, repository closure should be anticipated. Strategies like data migration to other repositories can help ensure continued access to valuable research data. Registries that track research data repositories play a vital role. They document repository lifespans and closure information, which is valuable for research purposes.
Data and Methodology
The researchers used the
re3data registry, a comprehensive database of research data repositories, to identify and investigate closed repositories. Re3data includes information on over 3,000 repositories across various disciplines. A repository was considered closed if data was inaccessible or the website explicitly stated closure.
This study described the steps for finding closed repositories in Figure 2. It began by filtering the re3data registry for repositories with listed end dates, resulting in an initial set of 223 candidates. This list was then manually reviewed to eliminate duplicates and entries that did not align with the established criteria for a closed repository. This process resulted in a final set of 191 repositories.
![]() |
Figure 2: Process of identifying closed repositories (Figure 1 in original paper) |
Next, they examined information from various sources including current and archived repository websites, data papers, and other resources. This helped them understand why the repositories shut down and whether the data was transferred to another repository. Finally, they categorized the closure reasons and identified the repository types and subject areas to gain context for their findings.
Results
Out of over 3,000 repositories listed in re3data, 191 (6.2%) were identified as having shut down at some point in the course of 25 years. The closure rate has grown steadily since re3data's launch in 2012, which likely reflects improved coverage of recent shutdowns. Interestingly, the very first closure identified dates back to 1999, the next closure occurred after a gap of several years in 2005 (Figure 3). It means that the count of closed repositories was one until 2004, and then it rose after 2005. This might be due to limitations in re3data's coverage of earlier closed repositories.
![]() |
Figure 3: Number of closed repositories indexed in re3data (cumulative) (Figure 2 in original paper) |
The analysis revealed some distinct characteristics of closed repositories. Most were disciplinary, specializing in a particular field, rather than being affiliated with an institution. Figure 4 shows that life sciences and natural sciences were most commonly represented among closed repositories. This aligns with the overall higher prevalence of these subjects in re3data compared to humanities and social sciences.
![]() |
Figure 4: Types and subjects of closed repositories in re3data (Figure 3 in original paper) |
The median age of a closed repository at shutdown was 12 years. Interestingly, the authors observed a correlation between a repository's lifespan and its characteristics. Repositories that shut down sooner (falling within the 25th percentile range) were more likely to be institutional and focus on humanities or social sciences. In contrast, repositories that remained operational for a longer period (75th percentile) were more likely to be disciplinary and specialize in life sciences (Figure 5).
![]() |
Figure 5: Proportion of type and subject of closed repositories in the 25th (short-lived) and 75th (long-lived) percentile of the age distribution (Table 2 in original paper) |
Risks resulting in repository shutdown
The study investigated the risks associated with closed research data repositories. In over 60% of closure cases, the reason remained unclear. For those with explanations, managerial issues were the primary reason. These included repository shutdowns due to internal restructuring, mission completion, or funding cuts. Technical issues had a minor impact on closures. Outdated technology, hacking incidents, and even obsolete data formats forced the closure of a few repositories.
The most concerning finding is the high risk of data loss. Nearly 90% of closed repositories no longer offer data access on their website. Only a small percentage (12%) provided limited access. There is a significant chance (nearly half of the cases) that data might be entirely lost for repositories that do not offer any access or have not transferred custody to another repository.
There were positive aspects regarding data migration. Evidence of data transfer to other repositories was found in over 40% of closed repositories. Established disciplinary repositories were frequent destinations for migrated data. In most cases where the initial receiving repository shut down, the data was successfully transferred again. However, the study also identified a few instances where the data transfer chain was broken due to subsequent closures.
Discussion
The study investigated how often research data repositories shut down and the challenges this creates for data preservation. A significant portion (6.2%) of repositories listed in a major registry have closed, and the median lifespan of a closed repository is only 12 years. This raises concerns about the long-term availability of research data.
There are two main strategies used to prevent data loss when a repository closes: providing limited access to the data or migrating the data to another repository. While some repositories offer limited access after closure, this is not ideal because proper data management ceases. Migrating data to another repository is more common, but it does not eliminate the risk of data loss, as the receiving repository could also shut down.
Another challenge identified in the study is the difficulty of determining if a repository is permanently closed. Repositories often do not provide clear information on their websites, and many disappear entirely after closing, leaving no trace of the data. This lack of transparency can make it difficult to find data and hinder the integrity of the scholarly record.
The study suggests improvements to address these challenges. The repository operators should be more transparent about closures, including providing information on the shutdown process and the receiving repository (if the data was migrated). Additionally, registries like re3data can play a crucial role by collecting and displaying information on closures and data transfers. By implementing these improvements, researchers will have a better chance of finding the data they need, and the scholarly record will be better preserved.
Conclusion
The long-term availability of research data is threatened by repository shutdowns. To address this, authors recommend that repository operators plan for shutdown from the beginning, including strategies for data migration. More research is needed to develop better data loss prevention strategies and understand the factors that increase the risk of a repository shutting down.
References
Johnston, L. R., Carlson, J., Hudson-Vitale, C., Imker, H., Kozlowski, W., Olendorf, R., &
Stewart, C. (2018). How Important is Data Curation? Gaps and Opportunities for Academic Libraries. Journal of Librarianship and Scholarly Communication, 6 (1).
https://doi.org/10.7710/2162-3309.2198