Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22:14:e58567.
doi: 10.2196/58567.

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

Affiliations

Patient-Related Metadata Reported in Sequencing Studies of SARS-CoV-2: Protocol for a Scoping Review and Bibliometric Analysis

Karen O'Connor et al. JMIR Res Protoc. .

Abstract

Background: There has been an unprecedented effort to sequence the SARS-CoV-2 virus and examine its molecular evolution. This has been facilitated by the availability of publicly accessible databases, such as the GISAID (Global Initiative on Sharing All Influenza Data) and GenBank, which collectively hold millions of SARS-CoV-2 sequence records. Genomic epidemiology, however, seeks to go beyond phylogenetic (the study of evolutionary relationships among biological entities) analysis by linking genetic information to patient characteristics and disease outcomes, enabling a comprehensive understanding of transmission dynamics and disease impact. While these repositories include fields reflecting patient-related metadata for a given sequence, the inclusion of these demographic and clinical details is scarce. The current understanding of patient-related metadata in published sequencing studies and its quality remains unexplored.

Objective: Our review aims to quantitatively assess the extent and quality of patient-reported metadata in papers reporting original whole genome sequencing of the SARS-CoV-2 virus and analyze publication patterns using bibliometric analysis. Finally, we will evaluate the efficacy and reliability of a machine learning classifier in accurately identifying relevant papers for inclusion in the scoping review.

Methods: The National Institutes of Health's LitCovid collection will be used for the automated classification of papers reporting having deposited SARS-CoV-2 sequences in public repositories, while an independent search will be conducted in MEDLINE and PubMed Central for validation. Data extraction will be conducted using Covidence (Veritas Health Innovation Ltd). The extracted data will be synthesized and summarized to quantify the availability of patient metadata in the published literature of SARS-CoV-2 sequencing studies. For the bibliometric analysis, relevant data points, such as author affiliations, citation metrics, author keywords, and Medical Subject Headings terms will be extracted.

Results: This study is expected to be completed in early 2025. Our classification model has been developed and we have classified publications in LitCovid published through February 2023. As of September 2024, papers through August 2024 are being prepared for processing. Screening is underway for validated papers from the classifier. Direct literature searches and screening of the results began in October 2024. We will summarize and narratively describe our findings using tables, graphs, and charts where applicable.

Conclusions: This scoping review will report findings on the extent and types of patient-related metadata reported in genomic viral sequencing studies of SARS-CoV-2, identify gaps in the reporting of patient metadata, and make recommendations for improving the quality and consistency of reporting in this area. The bibliometric analysis will uncover trends and patterns in the reporting of patient-related metadata, including differences in reporting based on study types or geographic regions. The insights gained from this study may help improve the quality and consistency of reporting patient metadata, enhancing the utility of sequence metadata and facilitating future research on infectious diseases.

Trial registration: OSF Registries osf.io/wrh95; https://doi.org/10.17605/OSF.IO/WRH95.

International registered report identifier (irrid): DERR1-10.2196/58567.

Keywords: COVID-19; GISAID; GenBank; SARS-CoV-2; genomic epidemiology; patient-related metadata; protocol; scoping review; sequence records.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Flow diagram of proposed screening of identified papers. We will first screen papers from our classifier, then we will screen those identified from database searches to ensure there are no duplicate records screened.

Update of

Similar articles

Cited by

References

    1. Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 2017 Mar 30;22(13):30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494. https://europepmc.org/abstract/MED/28382917 30494 - DOI - PMC - PubMed
    1. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2019;47(D1):D94–D99. doi: 10.1093/nar/gky989. https://europepmc.org/abstract/MED/30365038 5144964 - DOI - PMC - PubMed
    1. Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci U S A. 2020;117(17):9241–9243. doi: 10.1073/pnas.2004999117. https://www.pnas.org/doi/abs/10.1073/pnas.2004999117?url_ver=Z39.88-2003... 2004999117 - DOI - DOI - PMC - PubMed
    1. van Dorp L, Acman M, Richard D, Shaw LP, Ford CE, Ormond L, Owen CJ, Pang J, Tan CC, Boshier FA, Ortiz AT, Balloux F. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect Genet Evol. 2020;83:104351. doi: 10.1016/j.meegid.2020.104351. https://europepmc.org/abstract/MED/32387564 S1567-1348(20)30182-9 - DOI - PMC - PubMed
    1. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, Duan Y, Zhang H, Wang Y, Qian Z, Cui J, Lu J. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev. 2020;7(6):1012–1023. doi: 10.1093/nsr/nwaa036. https://europepmc.org/abstract/MED/34676127 nwaa036 - DOI - PMC - PubMed