Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 30;17(7):e1008984.
doi: 10.1371/journal.pcbi.1008984. eCollection 2021 Jul.

Gene name errors: Lessons not learned

Affiliations

Gene name errors: Lessons not learned

Mandhri Abeysooriya et al. PLoS Comput Biol. .

Abstract

Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Prevalence of gene name errors in the period 2014–2020.
(A) Publications with supplementary Excel gene lists. (B) Publications affected by gene name errors. (C) Proportion of affected publications.
Fig 2
Fig 2. A scatterplot of JIF and proportion of articles with supplementary Excel gene lists affected by gene name errors.
Fig 3
Fig 3. Gene name errors in supplementary files for three dominant journals in the period 2014–2020.

Comment in

  • Six tips for better spreadsheets.
    Perkel JM. Perkel JM. Nature. 2022 Aug;608(7921):229-230. doi: 10.1038/d41586-022-02076-1. Nature. 2022. PMID: 35918522 No abstract available.

References

    1. Zeeberg B, Riss J, Kane D, Bussey K, Uchio E, Linehan W, et al.. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics. 2004:5: 80. doi: 10.1186/1471-2105-5-80 - DOI - PMC - PubMed
    1. Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17: 177. doi: 10.1186/s13059-016-1044-7 - DOI - PMC - PubMed
    1. Bruford E, Braschi B, Denny P, Jones T, Seal R, Tweedie S. Nat Genet. 2020;52: 754–758. doi: 10.1038/s41588-020-0669-3 - DOI - PMC - PubMed
    1. Panko R. What we know about spreadsheet errors. Journal of Organizational and End User Computing. 1998;10: 15–21.
    1. Peng R. Reproducible research in computational science. Science. 2011;334; 1226–1227. doi: 10.1126/science.1213847 - DOI - PMC - PubMed