Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 2;10(3):e11475.
doi: 10.1002/aps3.11475. eCollection 2022 May-Jun.

PhyloHerb: A high-throughput phylogenomic pipeline for processing genome skimming data

Affiliations

PhyloHerb: A high-throughput phylogenomic pipeline for processing genome skimming data

Liming Cai et al. Appl Plant Sci. .

Abstract

Premise: The application of high-throughput sequencing, especially to herbarium specimens, is rapidly accelerating biodiversity research. Low-coverage sequencing of total genomic DNA (genome skimming) is particularly promising and can simultaneously recover the plastid, mitochondrial, and nuclear ribosomal regions across hundreds of species. Here, we introduce PhyloHerb, a bioinformatic pipeline to efficiently assemble phylogenomic data sets derived from genome skimming.

Methods and results: PhyloHerb uses either a built-in database or user-specified references to extract orthologous sequences from all three genomes using a BLAST search. It outputs FASTA files and offers a suite of utility functions to assist with alignment, partitioning, concatenation, and phylogeny inference. The program is freely available at https://github.com/lmcai/PhyloHerb/.

Conclusions: We demonstrate that PhyloHerb can accurately identify genes using a published data set from Clusiaceae. We also show via simulations that our approach is effective for highly fragmented assemblies from herbarium specimens and is scalable to thousands of species.

Keywords: herbariomics; high‐throughput sequencing; mitochondria; plastome; ribosomal genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PhyloHerb workflow. The five main function modules of PhyloHerb, including qc, getseq, ortho, conc, and order, provide a versatile and efficient tool to curate and analyze genome skimming data.
Figure 2
Figure 2
Defining and extracting genetic blocks with PhyloHerb. (A) A 5‐kbp‐long continuous genetic block on the plastid genome of Arabidopsis thaliana divided into two loci (LOC1 and LOC2). (B) The ‘getseq’ function of PhyloHerb can be used to extract sequences of predefined genetic blocks. The ‘genetic_block’ mode will include genes on both ends, while the ‘intergenic’ mode does not.
Figure 3
Figure 3
Phylogeny of 10 Clusiaceae species inferred from the complete (A) and subsampled plastid data sets (B–D). Raw reads were randomly subsampled to 100 Mbp (B), 50 Mbp (C), and 20 Mbp (D) to simulate decreasing base coverage in genome skimming. For all four analyses, a partitioned concatenated DNA alignment of 87 plastid genes was used to infer the species tree in IQ‐TREE using the GTRGAMMA model. Nodal support was estimated from 1000 ultrafast bootstrap replicates (UFBoot). Unlabeled nodes indicate 100 UFBoot support. Note the unstable placement of Chrysochlamys skutchii in subsampled data sets.

References

    1. Bakker, F. T. , Lei D., Yu J., Mohammadin S., Wei Z., van de Kerke S., Gravendeel B., et al. 2016. Herbarium genomics: Plastome sequence assembly from a range of herbarium specimens using an Iterative Organelle Genome Assembly pipeline. Biological Journal of the Linnean Society 117: 33–43.
    1. Cai, L. , Zhang H., and Davis C. C.. 2021. Herbariomics‐based biodiversity research: from specimen to phylogeny. Botany 2021: Annual Meeting of the Botanical Society of America, held online [online abstract]. Website: https://2021.botanyconference.org/engine/search/index.php?func=detail%26... [accessed 19 April 2022].
    1. Dierckxsens, N. , Mardulyn P., and Smits G.. 2017. NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 45: e18. - PMC - PubMed
    1. Doyle, J. J. 2022. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Systematic Biology 71: 476–489. - PubMed
    1. Folk, R. A. , Kates H. R., LaFrance R., Soltis D. E., Soltis P. S., and Guralnick R. P.. 2021. High‐throughput methods for efficiently building massive phylogenies from natural history collections. Applications in Plant Sciences 9: e11410. - PMC - PubMed