Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Jun 3;20(1):453.
doi: 10.1186/s12864-019-5806-y.

From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species

Affiliations
Comparative Study

From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species

Belinda Wright et al. BMC Genomics. .

Abstract

Background: Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci. When used in conjunction with a reference genome, reduced-representation sequencing (RRS) provides a cost-effective method for obtaining reliable diversity information for population genetics. Many software tools have been developed to process RRS data, though few studies of non-model species incorporate genome alignment in calling loci. A commonly-used RRS analysis pipeline, Stacks, has this capacity and so it is timely to compare its utility with existing software originally designed for alignment and analysis of whole genome sequencing data. Here we examine population genetic inferences from two species for which reference-aligned reduced-representation data have been collected. Our two study species are a threatened Australian marsupial (Tasmanian devil Sarcophilus harrisii; declining population) and an Arctic-circle migrant bird (pink-footed goose Anser brachyrhynchus; expanding population). Analyses of these data are compared using Stacks versus two widely-used genomics packages, SAMtools and GATK. We also introduce a custom R script to improve the reliability of single nucleotide polymorphism (SNP) calls in all pipelines and conduct population genetic inferences for non-model species with reference genomes.

Results: Although we identified orders of magnitude fewer SNPs in our devil dataset than for goose, we found remarkable symmetry between the two species in our assessment of software performance. For both datasets, all three methods were able to delineate population structure, even with varying numbers of loci. For both species, population structure inferences were influenced by the percent of missing data.

Conclusions: For studies of non-model species with a reference genome, we recommend combining Stacks output with further filtering (as included in our R pipeline) for population genetic studies, paying particular attention to potential impact of missing data thresholds. We recognise SAMtools as a viable alternative for researchers more familiar with this software. We caution against the use of GATK in studies with limited computational resources or time.

Keywords: DArTseq; GATK; Pink-footed goose; Population differentiation; Population genomics; Reference genome; SAMtools; Stacks; Tasmanian devil.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of methods used in this study to process reduced representation sequencing data with reference genomes, with some alternatives to software used indicated where appropriate. * Reproducibility filtering only possible if replicates or technical replicates are performed. ** Possible sex-linked SNP filter requires knowledge of sex of samples and is based on XX/XY system, but could be reversed for ZZ/ZW systems
Fig. 2
Fig. 2
PCoAs of the two datasets after processing through three pipelines with a call rate of 70% and the custom R script as outlined in Fig. 1. For devils, red is the “west” (N = 47) and blue is the “east” (N = 18) population. For goose, red is the “Iceland” (N = 20) and blue is the “Denmark” (N = 20) population. Inertia ellipses illustrate groupings and do not necessarily indicate confidence

References

    1. Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA. Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet. 2016;17(2):81–92. doi: 10.1038/nrg.2015.28. - DOI - PMC - PubMed
    1. Maroso F, Hillen J, Pardo B, Gkagkavouzis K, Coscia I, Hermida M, et al. Performance and precision of double digestion RAD (ddRAD) genotyping in large multiplexed datasets of marine fish species. Mar Genomics. 2018;39:64–72. doi: 10.1016/j.margen.2018.02.002. - DOI - PubMed
    1. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3(10):e3376. doi: 10.1371/journal.pone.0003376. - DOI - PMC - PubMed
    1. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One. 2012;7(5):e37135. doi: 10.1371/journal.pone.0037135. - DOI - PMC - PubMed
    1. Torkamaneh D, Laroche J, Belzile F. Genome-wide SNP calling from genotyping by sequencing (GBS) data: a comparison of seven pipelines and two sequencing technologies. PLoS One. 2016;11(8):e0161333. doi: 10.1371/journal.pone.0161333. - DOI - PMC - PubMed

Publication types

LinkOut - more resources