Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 12;15(9):e1007901.
doi: 10.1371/journal.ppat.1007901. eCollection 2019 Sep.

Is reliance on an inaccurate genome sequence sabotaging your experiments?

Affiliations

Is reliance on an inaccurate genome sequence sabotaging your experiments?

Rodrigo P Baptista et al. PLoS Pathog. .

Abstract

Advances in genomics have made whole genome studies increasingly feasible across the life sciences. However, new technologies and algorithmic advances do not guarantee flawless genomic sequences or annotation. Bias, errors, and artifacts can enter at any stage of the process from library preparation to annotation. When planning an experiment that utilizes a genome sequence as the basis for the design, there are a few basic checks that, if performed, may better inform the experimental design and ideally help avoid a failed experiment or inconclusive result.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Common genome assembly problems.
(A) Expected genome organization with roughly equal distribution of aligned reads across the genome sequence. (B) Illustration of a collapsed repeat region and detection via an accumulation of mapped reads resulting in a peak region in the depth coverage plot. (C) An 85-kb region shown for four strains of Toxoplasma gondii chr VI. Contiguous reads are shown as yellow and green horizontal lines. Annotated genes are shown in blue (forward strand) and red (reverse strand). Grey shading indicates orthology. The region defined by the orange window near 270-kb mark (top ruler) highlights the gap in contigs for two strains likely caused by the repetitive surface antigen genes located in the 238–275-kb region.

References

    1. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309(5733):409–15. 10.1126/science.1112631 - DOI - PubMed
    1. Claessens A, Affara M, Assefa SA, Kwiatkowski DP, Conway DJ. Culture adaptation of malaria parasites selects for convergent loss-of-function mutants. Sci Rep. 2017;7:41303 10.1038/srep41303 - DOI - PMC - PubMed
    1. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419(6906):498–511. 10.1038/nature01097 - DOI - PMC - PubMed
    1. Shin S, Park J. Characterization of sequence-specific errors in various next-generation sequencing systems. Mol Biosyst. 2016;12(3):914–22. 10.1039/c5mb00750j - DOI - PubMed
    1. Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R. Gene duplication as a major force in evolution. J Genet. 2013;92(1):155–61. - PubMed