Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 26;9(12):e115253.
doi: 10.1371/journal.pone.0115253. eCollection 2014.

Scholarly context not found: one in five articles suffers from reference rot

Affiliations

Scholarly context not found: one in five articles suffers from reference rot

Martin Klein et al. PLoS One. .

Abstract

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. STM articles and URI references per publication year - arXiv corpus.
Figure 2
Figure 2. STM articles and URI references per publication year - Elsevier corpus.
Figure 3
Figure 3. STM articles and URI references per publication year - PMC corpus.
Figure 4
Figure 4. STM articles per URI reference type they contain and per publication year - arXiv corpus.
Figure 5
Figure 5. STM articles per URI reference type they contain and per publication year - Elsevier corpus.
Figure 6
Figure 6. STM articles per URI reference type they contain and per publication year - PMC corpus.
Figure 7
Figure 7. URI reference type per publication year of the referencing STM article - arXiv corpus.
Figure 8
Figure 8. URI reference type per publication year of the referencing STM article - Elsevier corpus.
Figure 9
Figure 9. URI reference type per publication year of the referencing STM article - PMC corpus.
Figure 10
Figure 10. Link Rot - arXiv corpus.
Figure 11
Figure 11. Link Rot - Elsevier corpus.
Figure 12
Figure 12. Link Rot - PMC corpus.
Figure 13
Figure 13. Mementos for URIs archived within days of being referenced - arXiv corpus.
Figure 14
Figure 14. Mementos for URIs archived within days of being referenced - Elsevier corpus.
Figure 15
Figure 15. Mementos for URIs archived within days of being referenced - PMC corpus.
Figure 16
Figure 16. URI references: corpora as sources, TLDs as targets - all links.
Figure 17
Figure 17. URI references: corpora as sources, TLDs as targets - active links.
Figure 18
Figure 18. URI references: corpora as sources, TLDs as targets - Mementos created within days of referencing.
Figure 19
Figure 19. Growth rate of STM articles per publication year.
Figure 20
Figure 20. STM literature: extrapolated fraction of immune and not immune articles.
Figure 21
Figure 21. STM literature: extrapolated fraction of immune, healthy, and infected articles.

References

    1. Hiberlink (2014) Available: http://hiberlink.org/. Accessed: 2014 November 1.
    1. Resolve a DOI Name (2014) Available: http://dx.doi.org. Accessed: 2014 November 1.
    1. LOCKSS (2014) Available: http://lockss.org/. Accessed: 2014 November 1.
    1. CLOCKSS (2014) Available: http://www.clockss.org/. Accessed: 2014 November 1.
    1. Portico - A Digital Preservation and Electronic Archiving Service (2014) Available: http://www.portico.org/. Accessed: 2014 November 1.

Publication types

LinkOut - more resources