Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May;65(3):357-65.
doi: 10.1093/sysbio/syu046. Epub 2014 Jul 4.

Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences

Affiliations

Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences

Huateng Huang et al. Syst Biol. 2016 May.

Abstract

There is a lack of consensus on how next-generation sequence (NGS) data should be considered for phylogenetic and phylogeographic estimates, with some studies excluding loci with missing data, whereas others include them, even when sequences are missing from a large number of individuals. Here, we use simulations, focusing specifically on RAD (Restriction site Associated DNA) sequences, to highlight some of the unforeseen consequence of excluding missing data from next-generation sequencing. Specifically, we show that in addition to the obvious effects associated with reducing the amount of data used to make historical inferences, the decisions we make about missing data (such as the minimum number of individuals with a sequence for a locus to be included in the study) also impact the types of loci sampled for a study. In particular, as the tolerance for missing data becomes more stringent, the mutational spectrum represented in the sampled loci becomes truncated such that loci with the highest mutation rates are disproportionately excluded. This effect is exacerbated further by factors involved in the preparation of the genomic library (i.e., the use of reduced representation libraries, as well as the coverage) and the taxonomic diversity represented in the library (i.e., the level of divergence among the individuals). We demonstrate that the intuitive appeals about being conservative by removing loci may be misguided. [Next-generation sequencing; phylogenetic; phylogeography; RADseq; RADtags; species delimitation.].

PubMed Disclaimer