Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 1;35(11):2835-2849.
doi: 10.1093/molbev/msy166.

sppIDer: A Species Identification Tool to Investigate Hybrid Genomes with High-Throughput Sequencing

Affiliations

sppIDer: A Species Identification Tool to Investigate Hybrid Genomes with High-Throughput Sequencing

Quinn K Langdon et al. Mol Biol Evol. .

Abstract

The genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species, that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer; last accessed September 6, 2018).

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Workflow of sppIDer. (a) An upstream step concatenates all the desired reference genomes (represented by colored bars). Generally, references should be distinct species (see Materials and Methods for advice about choosing references). This combination reference genome can be used for many analyses. (b) The main sppIDer pipeline. First, reads (short lines) are mapped. This output is used to parse for quality and percentage (left) or for coverage (right). On the left, quality (high MQ black lines versus low MQ light lines) is parsed, and the percentage of reads that map to each genome or do not map (gray bar) is calculated. To determine coverage, only MQ > 3 reads (black lines) are kept and sorted into the combination reference genome order. These reads are then counted, either for each base pair or, for large genomes (combination length >4 Gb), in groups. Then, the combination reference genome is broken into equally sized pieces, and the average coverage is calculated. (c) Several plots are produced. Shown here are examples of Percentage Mapped and Mapping Quality plots, a plot showing average coverage by species, plots of coverage distributions, and two ways to show coverage by windows with species side-by-side or stacked. Scer, Saccharomyces cerevisiae; Spar, S. paradoxus; Smik, S. mikatae; Skud, S. kudriavzevii; Sarb, S. arboricola; Suva, S. uvarum; Seub, S. eubayanus.
<sc>Fig</sc>. 2.
Fig. 2.
Normalized coverage plots of Saccharomyces test cases. (a) Reads from a New Zealand isolate of S. eubayanus, P1C1, mapped to the S. eubayanus reference genome (magenta). (b) Reads from an ale strain, FostersO, mapped to the S. cerevisiae reference genome (red), with visually detectable aneuploidies. (c) Reads from a hybrid Frohberg lager strain, W34/70, mapped to both the S. cerevisiae and S. eubayanus reference genomes in an average approximately 1:1 ratio with visually detectable translocations and aneuploidies. (d) Reads from a hybrid Saaz lager strain, CBS1503, mapped to both S. cerevisiae and S. eubayanus reference genomes in an average approximately 1:2 (respectively) ratio with visually detectable translocations and aneuploidies. (e) Reads from a wine hybrid strain, Vin7, mapped to S. cerevisiae and S. kudriavzevii (green) reference genomes in an average approximately 2:1 (respectively) ratio. (f) Reads from a hybrid cider-producing strain, CBS2834, mapped to four reference genomes: S. cerevisiae, S. kudriavzevii, S. uvarum (purple), and S. eubayanus.
<sc>Fig</sc>. 3.
Fig. 3.
Simulated phylogeny of ten species and sppIDer’s detection of hybrids from this phylogeny. (a) Phylogeny built with AAF. (b) Reads from G mapped to the G reference genome. (c) Reads from a pseudo-hybrid of the closely related species G and H mapped to the G and H references. (d) Reads from a more distant pseudo-hybrid of E and G mapped to references E and G. (e) Reads of an ancient pseudo-hybrid of A and a common ancestor of G and H mapped to the references of A, G, and H, which are the lineages that descended from the hybrid’s parents. (f) Without the G reference genome, reads from a pseudo-hybrid of the closely related species G and H instead mapped to the H reference genome, with some mapped promiscuously to references I and J.
<sc>Fig</sc>. 4.
Fig. 4.
Comparison of the percentage of reads that mapped when different reference genomes were excluded, compared with when all possible reference genomes for Saccharomyces were available (middle panels). (a) When the S. cerevisiae reference genome was not provided and reads from a Frohberg lager strain, W34/70, were mapped, more reads failed to map (gray) or mapped to the S. paradoxus reference genome (yellow). (b) When the full array of Saccharomyces genomes was provided, reads for the lager strain mapped to both S. cerevisiae and S. eubayanus. (c) When the S. eubayanus reference genome was removed, more reads from the lager strain failed to map or mapped to the S. uvarum reference genome (purple). (d) With the removal of the S. cerevisiae reference genome, reads from the S. cerevisiae × S. kudriavzevii hybrid strain Vin7, which would normally map to S. cerevisiae, instead failed to map or mapped to S. paradoxus. (e) When all genomes were used, reads mapped to both S. cerevisiae and S. kudriavzevii. (f) With the removal of the S. kudriavzevii reference genome, reads that would normally map to S. kudriavzevii instead failed to map or were distributed across all other genomes.
<sc>Fig</sc>. 5.
Fig. 5.
Examples using animal and plant genomes. (a) Reads from a Drosophila yakuba individual mapped primarily (>99%) to the D. yakuba reference genome. (b) Reads from the sister species D. santomae mapped best to the D. yakuba reference genome with some mapped promiscuously to other reference genomes. (c) Reads from the more distantly related species D. teissieri mapped mostly to the D. yakuba reference genome, but with more reads not mapped and mapped promiscuously to other related reference genomes. (d) Reads from an Arabidopsis thaliana accession from Tanzania mapped back to the European reference genome for A. thaliana. The repetitive nature of centromeres causes the coverage to fluctuate around those regions. (e) Reads from the hybrid species A. kamchatica mapped to the two parental reference genomes: A. halleri and A. lyrata.

Similar articles

Cited by

References

    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. 2000. The genome sequence of Drosophila melanogaster. Science 287(5461): 2185–2196. - PubMed
    1. Alekseyenko AA, Ellison CE, Gorchakov AA, Zhou Q, Kaiser VB, Toda N, Walton Z, Peng S, Park PJ, Bachtrog D, et al. 2013. Conservation and de novo acquisition of dosage compensation on newly evolved sex chromosomes in Drosophila. Genes Dev. 27(8): 853–858. - PMC - PubMed
    1. Allen JM, Huang DI, Cronk QC, Johnson KP.. 2015. aTRAM—automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data. BMC Bioinformatics 16(1): 1–7. - PMC - PubMed
    1. Almeida P, Gonçalves C, Teixeira S, Libkind D, Bontrager M, Masneuf-Pomarède I, Albertin W, Durrens P, Sherman DJ, Marullo P, et al. 2014. A Gondwanan imprint on global diversity and domestication of wine and cider yeast Saccharomyces uvarum. Nat Commun. 5:4044. - PMC - PubMed
    1. Baker E, Wang B, Bellora N, Peris D, Hulfachor AB, Koshalek JA, Adams M, Libkind D, Hittinger CT.. 2015. The genome sequence of Saccharomyces eubayanus and the domestication of lager-brewing yeasts. Mol Biol Evol. 32(11): 2818–2831. - PMC - PubMed

Publication types