Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 21:10:1407.
doi: 10.3389/fgene.2019.01407. eCollection 2019.

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Affiliations
Review

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Tobias Andermann et al. Front Genet. .

Abstract

High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing effort on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing coverage. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth. Moreover, it has proven to produce powerful, large multi-locus DNA sequence datasets suitable for phylogenetic analyses. However, target capture requires careful considerations, which may greatly affect the success of experiments. Here we provide a simple flowchart for designing phylogenomic target capture experiments. We discuss necessary decisions from the identification of target loci to the final bioinformatic processing of sequence data. We outline challenges and solutions related to the taxonomic scope, sample quality, and available genomic resources of target capture projects. We hope this review will serve as a useful roadmap for designing and carrying out successful phylogenetic target capture studies.

Keywords: Hyb-Seq; Illumina; NGS; anchored enrichment; bait; high throughput sequencing; molecular phylogenetics; probe.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Published studies deposited in Web of Science that have used target sequence capture in phylogenetic research. (A) Number of publications by year (** our search included papers in Web of Science by December 20, 2019). (B) Normalized cumulative publications using target sequence capture in relation to other phylogenomic studies over time sorted by year of publication. We restricted our searches for studies published from 2006, the year of release of the first commercial high-throughput sequencer. We searched for Original Articles published in English in the category Evolutionary Biology. We used eight combinations of keywords in independent searches that included the terms: hybrid OR target* OR exon OR anchored AND enrichment OR capture AND phylogenom*. We merged the datasets and we removed duplicated records by comparing unique DOIs (blue bars in panel A). These searches were contrasted with all other phylogenomic studies as specified by the keywords sequencing AND phylogenom* (yellow bars in panel A).
Figure 2
Figure 2
Decision chart and overview of the main considerations for project design in high throughput sequencing. The flow chart shows the most common groups of sequencing methodologies. Sections 13 summarize key components of project design, starting by choosing the sequencing methods, followed by bait design and finishing with the optimization of laboratory practices. Section 3 shows recommended (full circle), recommended in some cases (half circles) and not recommended (empty circles) practices based on input DNA quality and quantity. Low input refers to low input DNA extraction kits and touch down refers to temperature ramps at the hybridization and capture steps.
Figure 3
Figure 3
The most common sources of read-variation within reference-based assemblies of a given organism. (A) Sequencing errors are identifiable as single variants that are only present on an individual read and are generally not shared across several reads. (B) Paralogous reads are visible as blocks of reads with several variants shared among a low frequency of reads. Paralogous reads originate from a different part of the genome and are a result of gene or genome duplication. (C) Allelic variation can usually be identified by variants that are shared among many reads, occurring at a read frequency of approximately 1/ploidy-level, i.e. 0.5 for diploid organisms.

References

    1. Aird D., Ross M. G., Chen W. S., Danielsson M., Fennell T., Russ C., et al. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18. 10.1186/gb-2011-12-2-r18 - DOI - PMC - PubMed
    1. Albert T. J., Molla M. N., Muzny D. M., Nazareth L., Wheeler D., Song X., et al. (2007). Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905. 10.1038/nmeth1111 - DOI - PubMed
    1. Alfaro M. E., Faircloth B. C., Harrington R. C., Sorenson L., Friedman M., Thacker C. E., et al. (2018). Explosive diversification of marine fishes at the Cretaceous–Palaeogene boundary. Nat. Ecol. Evol. 2, 688–696. 10.1038/s41559-018-0494-6 - DOI - PubMed
    1. Allen J. M., Boyd B., Nguyen N. P., Vachaspati P., Warnow T., Huang D. I., et al. (2017). Phylogenomics from whole genome sequences using aTRAM. Syst. Biol. 66, 786–798. 10.1093/sysbio/syw105 - DOI - PubMed
    1. Anand S., Mangano E., Barizzone N., Bordoni R., Sorosina M., Clarelli F., et al. (2016). Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci. Rep. 6, 33735. 10.1038/srep33735 - DOI - PMC - PubMed