Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr;18(2):91-103.
doi: 10.1016/j.gpb.2018.11.006. Epub 2020 Jul 9.

Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Affiliations

Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Qingyu Chen et al. Genomics Proteomics Bioinformatics. 2020 Apr.
No abstract available

PubMed Disclaimer

Figures

Figure 1
Figure 1
Biological analysis pipeline Three stages of a biological analysis pipeline, heavily involving biological databases, are presented. Pre-DB: the data collection and submission stage, where entity duplicates often matter. Within-DB: the data curation and visualization stage, where near-identical duplicates often matter. Post-DB: the data downloading and usage stage, where the definition of duplicates is use case dependent. DB: database.
Figure 2
Figure 2
Characteristics of duplicate records A. Duplicate types and number of participants who selected different duplicate types. B. Distribution of participants according to the number of duplicate types they selected. There are 21 participants in total.
Figure 3
Figure 3
Impacts of duplicate records A. The number of participants who believed duplication has impacts or not. B. A more detailed breakdown by type of impact, for those who believed duplication has impacts.
Figure 4
Figure 4
Solutions to duplicate records The X-axis represents the options to address duplication; the Y-axis represents the corresponding number of participants selecting that option.

References

    1. Baxevanis A., Bateman A. The importance of biological databases in biological discovery. Curr Protoc Bioinformatics. 2015;50:1–8. - PubMed
    1. Benson D.A., Cavanaugh M., Clark K., Karsch-Mizrachi I., Lipman D.J., Ostell J. GenBank. Nucleic Acids Res. 2017;45:D37. - PMC - PubMed
    1. Toribio A.L., Alako B., Amid C., Cerdeño-Tarrága A., Clarke L., Cleland I. European nucleotide archive in 2016. Nucleic Acids Res. 2017;45:D32–36. - PMC - PubMed
    1. Cochrane G., Karsch-Mizrachi I., Takagi T. The international nucleotide sequence database collaboration. Nucleic Acids Res. 2017;44:D48–51. - PMC - PubMed
    1. The UniProt Consortium UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–69. - PMC - PubMed