A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Tobias Andermann^{1

2}, Maria Fernanda Torres Jiménez^{1

2}, Pável Matos-Maraví^{1

2

3}, Romina Batista^{2

4

5}, José L Blanco-Pastor^{1

6}, A Lovisa S Gustafsson⁷, Logan Kistler⁸, Isabel M Liberal¹, Bengt Oxelman^{1

2}, Christine D Bacon^{1

2}, Alexandre Antonelli^{1

2

9}

Affiliations

¹ Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
² Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.
³ Institute of Entomology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czechia.
⁴ Programa de Pós-Graduação em Genética, Conservação e Biologia Evolutiva, PPG GCBEv-Instituto Nacional de Pesquisas da Amazônia-INPA Campus II, Manaus, Brazil.
⁵ Coordenação de Zoologia, Museu Paraense Emílio Goeldi, Belém, Brazil.
⁶ INRAE, Centre Nouvelle-Aquitaine-Poitiers, Lusignan, France.
⁷ Natural History Museum, University of Oslo, Oslo, Norway.
⁸ Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States.
⁹ Royal Botanic Gardens, Kew, Richmond-Surrey, United Kingdom.

PMID: 32153629
PMCID: PMC7047930
DOI: 10.3389/fgene.2019.01407

Review

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Tobias Andermann et al. Front Genet. 2020.

. 2020 Feb 21:10:1407.

doi: 10.3389/fgene.2019.01407. eCollection 2019.

Authors

Affiliations

¹ Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.
² Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.
³ Institute of Entomology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czechia.
⁴ Programa de Pós-Graduação em Genética, Conservação e Biologia Evolutiva, PPG GCBEv-Instituto Nacional de Pesquisas da Amazônia-INPA Campus II, Manaus, Brazil.
⁵ Coordenação de Zoologia, Museu Paraense Emílio Goeldi, Belém, Brazil.
⁶ INRAE, Centre Nouvelle-Aquitaine-Poitiers, Lusignan, France.
⁷ Natural History Museum, University of Oslo, Oslo, Norway.
⁸ Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States.
⁹ Royal Botanic Gardens, Kew, Richmond-Surrey, United Kingdom.

PMID: 32153629
PMCID: PMC7047930
DOI: 10.3389/fgene.2019.01407

Abstract

High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing effort on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing coverage. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth. Moreover, it has proven to produce powerful, large multi-locus DNA sequence datasets suitable for phylogenetic analyses. However, target capture requires careful considerations, which may greatly affect the success of experiments. Here we provide a simple flowchart for designing phylogenomic target capture experiments. We discuss necessary decisions from the identification of target loci to the final bioinformatic processing of sequence data. We outline challenges and solutions related to the taxonomic scope, sample quality, and available genomic resources of target capture projects. We hope this review will serve as a useful roadmap for designing and carrying out successful phylogenetic target capture studies.

Keywords: Hyb-Seq; Illumina; NGS; anchored enrichment; bait; high throughput sequencing; molecular phylogenetics; probe.

PubMed Disclaimer

Figures

**Figure 1**
Published studies deposited in Web of Science that have used target sequence capture in phylogenetic research. **(A)** Number of publications by year (** our search included papers in Web of Science by December 20, 2019). **(B)** Normalized cumulative publications using target sequence capture in relation to other phylogenomic studies over time sorted by year of publication. We restricted our searches for studies published from 2006, the year of release of the first commercial high-throughput sequencer. We searched for Original Articles published in English in the category ‘Evolutionary Biology’. We used eight combinations of keywords in independent searches that included the terms: ‘hybrid’ OR ‘target*’ OR ‘exon’ OR ‘anchored’ AND ‘enrichment’ OR ‘capture’ AND ‘phylogenom*’. We merged the datasets and we removed duplicated records by comparing unique DOIs (blue bars in panel A). These searches were contrasted with all other phylogenomic studies as specified by the keywords ‘sequencing’ AND ‘phylogenom*’ (yellow bars in panel A).

**Figure 2**
Decision chart and overview of the main considerations for project design in high throughput sequencing. The flow chart shows the most common groups of sequencing methodologies. Sections 1–3 summarize key components of project design, starting by choosing the sequencing methods, followed by bait design and finishing with the optimization of laboratory practices. Section 3 shows recommended (full circle), recommended in some cases (half circles) and not recommended (empty circles) practices based on input DNA quality and quantity. “Low input” refers to low input DNA extraction kits and “touch down” refers to temperature ramps at the hybridization and capture steps.

**Figure 3**
The most common sources of read-variation within reference-based assemblies of a given organism. **(A)** Sequencing errors are identifiable as single variants that are only present on an individual read and are generally not shared across several reads. **(B)** Paralogous reads are visible as blocks of reads with several variants shared among a low frequency of reads. Paralogous reads originate from a different part of the genome and are a result of gene or genome duplication. **(C)** Allelic variation can usually be identified by variants that are shared among many reads, occurring at a read frequency of approximately 1/ploidy-level, i.e. 0.5 for diploid organisms.

See this image and copyright information in PMC

References

1. Aird D., Ross M. G., Chen W. S., Danielsson M., Fennell T., Russ C., et al. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18. 10.1186/gb-2011-12-2-r18 - DOI - PMC - PubMed
1. Albert T. J., Molla M. N., Muzny D. M., Nazareth L., Wheeler D., Song X., et al. (2007). Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905. 10.1038/nmeth1111 - DOI - PubMed
1. Alfaro M. E., Faircloth B. C., Harrington R. C., Sorenson L., Friedman M., Thacker C. E., et al. (2018). Explosive diversification of marine fishes at the Cretaceous–Palaeogene boundary. Nat. Ecol. Evol. 2, 688–696. 10.1038/s41559-018-0494-6 - DOI - PubMed
1. Allen J. M., Boyd B., Nguyen N. P., Vachaspati P., Warnow T., Huang D. I., et al. (2017). Phylogenomics from whole genome sequences using aTRAM. Syst. Biol. 66, 786–798. 10.1093/sysbio/syw105 - DOI - PubMed
1. Anand S., Mangano E., Barizzone N., Bordoni R., Sorosina M., Clarelli F., et al. (2016). Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci. Rep. 6, 33735. 10.1038/srep33735 - DOI - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- Bio-protocol Exchange
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Affiliations

A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project

Authors

Affiliations

Abstract

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources