Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;22(8):3087-3105.
doi: 10.1111/1755-0998.13666. Epub 2022 Jun 27.

A target capture approach for phylogenomic analyses at multiple evolutionary timescales in rosewoods (Dalbergia spp.) and the legume family (Fabaceae)

Affiliations

A target capture approach for phylogenomic analyses at multiple evolutionary timescales in rosewoods (Dalbergia spp.) and the legume family (Fabaceae)

Simon Crameri et al. Mol Ecol Resour. 2022 Nov.

Abstract

Understanding the genetic changes associated with the evolution of biological diversity is of fundamental interest to molecular ecologists. The assessment of genetic variation at hundreds or thousands of unlinked genetic loci forms a sound basis to address questions ranging from micro- to macroevolutionary timescales, and is now possible thanks to advances in sequencing technology. Major difficulties are associated with (i) the lack of genomic resources for many taxa, especially from tropical biodiversity hotspots; (ii) scaling the numbers of individuals analysed and loci sequenced; and (iii) building tools for reproducible bioinformatic analyses of such data sets. To address these challenges, we developed target capture probes for genomic studies of the highly diverse, pantropically distributed and economically significant rosewoods (Dalbergia spp.), explored the performance of an overlapping probe set for target capture across the legume family (Fabaceae), and built the general purpose bioinformatic pipeline CaptureAl. Phylogenomic analyses of Malagasy Dalbergia species yielded highly resolved and well supported hypotheses of evolutionary relationships. Population genomic analyses identified differences between closely related species and revealed the existence of a potentially new species, suggesting that the diversity of Malagasy Dalbergia species has been underestimated. Analyses at the family level corroborated previous findings by the recovery of monophyletic subfamilies and many well-known clades, as well as high levels of gene tree discordance, especially near the root of the family. The new genomic and bioinformatic resources, including the Fabaceae1005 and Dalbergia2396 probe sets, will hopefully advance systematics and ecological genetics research in legumes, and promote conservation of the highly diverse and endangered Dalbergia rosewoods.

Keywords: Dalbergia; Fabaceae; Leguminosae; phylogenomics; rosewood; target capture.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIGURE 1
FIGURE 1
Seven bioinformatic steps of the captureal pipeline. Steps 1–3 are shown for a single sample, but are executed for multiple samples in parallel, and steps 5–7 are shown for three target regions, but are executed for many target regions in parallel (indicated by dashed grey lines). In STEP 1, trimmed reads (narrow bars) are mapped to target reference sequences (broad bars; five regions are shown). Coverage statistics are generated and written to coverage_stats.txt, which informs filters applied to poorly sequenced samples and target regions (shown in red; four regions are retained, region 5 does not pass the filters due to low coverage in one or more taxon groups). In STEP 2, read pairs are extracted and assembled separately for each sample and each target region, resulting in zero (not shown) or one (region 1) to multiple contigs (region 4). In STEP 3, contigs are aligned to their respective reference sequence using exonerate to select the likely orthologous contig(s). Nonoverlapping contigs with normalized alignment scores passing a user‐specified threshold (green ticks) are combined to a single sequence (region 3). If contigs overlap, only the best‐matching contig is selected (large green ticks; region 2), and contigs with alignment scores below the threshold (red crosses) are discarded (region 4). Contig statistics are generated and written to loci_stats.txt, which informs the filters applied to poorly assembled samples and target regions in STEP 4 (shown in red; three regions are retained, region 4 does not pass the filters due to high prevalence of multiple contigs in one or more taxon groups). In STEP 5, the contigs of multiple samples (eight are shown for three target regions) are aligned and trimmed, generating a data set potentially suitable for phylogenomic analyses. In the optional STEP 6, a consensus sequence is generated for each target region alignment, and overlaps between neighbouring regions are identified using blast+ (shown in red). Individual contigs from such regions are aligned, collapsed to a single sequence per sample, and trimmed. The merged alignments can be used as replacements for overlapping alignments and filtered for phylogenomic analyses as from Step 5. In the optional STEP 7, consensus sequences for each alignment are generated for each taxon group, as well as overall consensus sequences across all taxon groups. These can serve as longer and taxon‐specific reference sequences for STEP 1. Remapped reads can then be used for variant calling and population genomic analyses, or to refine target region assembly, alignment, and downstream analyses by repeating Steps 1–5 or 1–7
FIGURE 2
FIGURE 2
Coalescent‐based phylogeny of the Fabaceae subfamily set (n = 110) inferred using astral‐III on 986 gene trees. Pie charts on each node denote the fraction of gene trees that are consistent with the shown topology (q1; blue), and with the first (q2; orange) and second (q3; grey) alternative topologies. Local posterior probabilities are shown as small colour‐coded dots in the center of each pie chart, black dots indicate clades with 99%–100% local posterior probability (see inset legend). Replicate specimens are labelled with a bold “R”. 860 gene trees (87.22%) had missing taxa. The overall normalized quartet score was 88.82%
FIGURE 3
FIGURE 3
Coalescent‐based phylogeny of the Malagasy Dalbergia species set (n = 63) inferred using astral‐III on 2389 gene trees. Pie charts on each node denote the fraction of gene trees that are consistent with the shown topology (q1; blue), and with the first (q2; orange) and second (q3; grey) alternative topologies. Local posterior probabilities are shown as small colour‐coded dots in the center of each pie chart, black dots indicate clades with 99%–100% local posterior probability (see inset legend). The geographic origins of accessions from Madagascar are indicated as bold numbers in the tree, which correspond to political regions of Madagascar, as well as to ecological regions following Dinerstein et al. (2017), see large inset map. Known countries of origin of non‐Malagasy accessions are indicated in bold. Five major clades within Madagascar Supergroups I and II showing ecogeographic or morphological coherence are named and their distribution is indicated (see small maps to the right). Replicate specimens are labelled with a bold “R”. 1014 gene trees (42.44%) had missing taxa. The overall normalized quartet score was 85.42%
FIGURE 4
FIGURE 4
Population genomic analyses in Dalbergia monticola and D. orientalis. (a) Principal component analysis (PCA) and superimposed neighbour‐joining (NJ) tree of the population set (n = 51) inferred from 60,204 biallelic SNPs with no missing data. Dots in PCA space and NJ tips represent individuals colour‐coded according to taxa. Numbers adjacent to NJ tree branches denote sampling locations as shown in Figure S4. See Figure S10 for NJ tip labels. (b) structure probability at different values of K, as indicated by the delta K statistic (Evanno et al., 2005). (c) structure results for the 51 individuals and 7156 single nucleotide polymorphisms (SNPs). The four clustering solutions with elevated delta K values are shown (see Figure S11 for all results assuming two to 10 clusters) and represent major clusters averaged across 10 replicate runs using clumpak (Kopelman et al., 2015). Individuals (columns) are colour‐coded and sorted by taxa and then by increasing degrees south latitude. Numbers at the top indicate broad sampling locations as in Figure S4 and Table S3. Columns marked with an asterisk (*) denote individuals obtained from museum specimens

References

    1. Adema, F. , Ohashi, H. , & Sunarno, B. (2016). Notes on Malesian Fabaceae (Leguminosae‐Papilionoideae) 17 The genus Dalbergia Blumea . Blumea Journal of Plant Taxonomy and Plant Geography, 61(3), 186–206. 10.3767/000651916X693905 - DOI
    1. Bankevich, A. , Nurk, S. , Antipov, D. , Gurevich, A. A. , Dvorkin, M. , Kulikov, A. S. , Lesin, V. M. , Nikolenko, S. I. , Pham, S. , Prjibelski, A. D. , Pyshkin, A. V. , Sirotkin, A. V. , Vyahhi, N. , Tesler, G. , Alekseyev, M. A. , & Pevzner, P. A. (2012). SPAdes: A new genome assembly algorithm and its applications to single‐cell sequencing. Journal of Computational Biology, 19(5), 455–477. 10.1089/cmb.2012.0021 - DOI - PMC - PubMed
    1. Barrett, M. A. , Brown, J. L. , & Yoder, A. D. (2013). Conservation: Protection for trade of precious rosewood. Nature, 499, 29. 10.1038/499029c - DOI - PubMed
    1. Bosser, J. , & Rabevohitra, R. (2002). Tribe Dalbergieae. In Du Puy D. J., Labat J. N., Rabevohitra R., Villiers J. F., Bosser J., & Moat J. (Eds.), The Leguminosae of Madagascar (pp. 321–361). Royal Botanic Gardens, Kew.
    1. Brewer, G. E. , Clarkson, J. J. , Maurin, O. , Zuntini, A. R. , Barber, V. , Bellot, S. , Biggs, N. , Cowan, R. S. , Davies, N. M. J. , Dodsworth, S. , Edwards, S. L. , Eiserhardt, W. L. , Epitawalage, N. , Frisby, S. , Grall, A. , Kersey, P. J. , Pokorny, L. , Leitch, I. J. , Forest, F. , & Baker, W. J. (2019). Factors affecting targeted sequencing of 353 nuclear genes from herbarium specimens spanning the diversity of Angiosperms. Frontiers in Plant Science, 10, 1102. 10.3389/fpls.2019.01102 - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources