Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 13;18(9):e0290827.
doi: 10.1371/journal.pone.0290827. eCollection 2023.

Wikipedia as a tool for contemporary history of science: A case study on CRISPR

Affiliations

Wikipedia as a tool for contemporary history of science: A case study on CRISPR

Omer Benjakob et al. PLoS One. .

Abstract

Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language Wikipedia articles on CRISPR. Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article's text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field's maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields-coronavirus and circadian clocks. Our method utilizes Wikipedia as a digital and free archive, showing it can document the incremental growth of knowledge and the manner scientific research accumulates and translates into public discourse. Using Wikipedia in this manner compliments and overcomes some issues with contemporary histories and can also augment existing bibliometric research.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow for using Wikipedia to research the history of a specific field.
A) Scheme of proposed research flow as supported by our tool: A free search of Wikipedia’s English-language articles is conducted to identify relevant articles; these are then filtered to include only those with the term in either their title or that of a section. Next, different analyses can be performed on the anchor article and corpus. Of the listed examples, in bold are the data provided by our tool, the rest are currently collected manually. B) Breakdown of flow scheme in the CRISPR case study, as of June 2022.
Fig 2
Fig 2. Comparing versions of the CRISPR article.
A) An example of the top of a Wikipedia article, note the `View history`(frame added) tab that enables accessing older versions of the text. Snapshots from the Wikipedia archive of the CRISPR article: B) the full text of the article when it first opened on June 30th 2005, and C) extract of the lead section’s opening paragraphs, as of July 6th, 2022.
Fig 3
Fig 3. Growth of CRISPR on Wikipedia—anchor article and corpus.
A) The number of sections and subsections in the CRISPR article since it opened in 2005. B) Titles of the article’s sections throughout 2010–2022, sampled biannually. Subsections and those listing sources were removed for clarity and can be found in S2 Table. Alignment and coloring were added manually to highlight sections repeating in consecutive revisions. C) Timeline of the number of the corpus’ articles opened each year since Wikipedia was launched (2001). The articles titles and DOB can be found in S1 Table. D) Changelog as of November 25th, 2013, documenting the section title change from “Possible applications” to “Applications” (Other changes that occurred as part of that edit were removed for visibility, and can be found in the archive). All analyses shown occurred until June 2022.
Fig 4
Fig 4
Comparing an article’s creation date and CRISPR’s first mentions. A) An article’s date of birth (DOB, blue) compared to the year of it first mentioned “CRISPR” (red), sorted by the former. B) The relation between the DOB and the time it took for the first mention of CRISPR of each article. Displayed is a linear trendline and R2.
Fig 5
Fig 5. CRISPR-bibliometrics on Wikipedia.
A) The number of references in the CRISPR article’s reference section since it opened until December 2021. B) “CRISPR”s SciScore (shown until December 2021). C) The article’s references latency distribution (i.e., duration between a scientific paper’s publication and its integration into Wikipedia). D) A timeline comparing the date of selected publications (black frames, left) to their citation in the CRISPR article (blue frames, right). E) A snapshot comparing two versions of the CRISPR article from May 2007, showing how changes to the wording of the text were linked to the citation of Barrangou et al., 2007.
Fig 6
Fig 6. Comparing Wikipedia corpuses: Different fields show different data.
Corpuses were generated and quantitative metrics automatically collected in June-July 2022, for the terms “CRISPR”, “Circadian” and “Coronavirus”. The following data are presented: A) the number of articles opened each year, B) the top 10 most cited journals, C) the top 10 most cited.org websites, D) the top 10 most cited references altogether, E) SciScore distribution, along with the total (sum of all references in all articles) and median scores of the articles’ distribution.

References

    1. Sepkoski D. Towards “A Natural History of Data”: Evolving Practices and Epistemologies of Data in Paleontology, 1800–2000. J Hist Biol 2013;46:401–44. doi: 10.1007/s10739-012-9336-6 - DOI - PubMed
    1. Rheinberger H-J. Infra-Experimentality: From Traces to Data, from Data to Patterning Facts. Hist Sci 2011;49:337–48. 10.1177/007327531104900306. - DOI
    1. Maree DJF. The Methodological Division: Quantitative and Qualitative Methods. Realism Psychol. Sci., Cham: Springer International Publishing; 2020, p. 13–42. 10.1007/978-3-030-45143-1_2. - DOI
    1. Lean OM, Rivelli L, Pence CH. Digital Literature Analysis for Empirical Philosophy of Science 2021. https://www.journals.uchicago.edu/doi/10.1086/715049 (accessed March 5, 2023). - DOI
    1. Pence CH, Ramsey G. How to Do Digital Philosophy of Science. Philos Sci 2018;85:930–41. 10.1086/699697. - DOI

Publication types