Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 10;71(2):301-319.
doi: 10.1093/sysbio/syab035.

A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life

Affiliations

A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life

William J Baker et al. Syst Biol. .

Abstract

The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this article are to (i) document our methods, (ii) describe our first data release, and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic data set for angiosperms to date, comprising 3099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96$\%$) and 2333 genera (17$\%$). A "first pass" angiosperm tree of life was inferred from the data, which totaled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated data set, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone toward a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardized nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world's natural history collections. [Angiosperms; Angiosperms353; genomics; herbariomics; museomics; nuclear phylogenomics; open access; target sequence capture; tree of life.].

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary workflow. Overview of steps taken by the PAFTOL project to generate Data Release 1.0 of the Kew Tree of Life Explorer (https://treeoflife.kew.org). The stages of the workflow are further elaborated in Figs. 2–4.
Figure 2
Figure 2
Sample processing and data analysis workflows. Sample processing (left): processes are indicated by bold headings with reagents and machines used given below; quality control (QC) checkpoints are indicated. Data analysis (right): pipeline products are shown in circles (available to download via the Kew Tree of Life Explorer, https://treeoflife.kew.org); processes are indicated by bold headings with programs used given below.
Figure 3
Figure 3
Family identification validation workflow. Processes are indicated by bold headings. Embedded table (bottom right) indicates decisions made for each sample based on the two validation steps.
Figure 4
Figure 4
Data publication workflow. Implementation of the Kew Tree of Life Explorer data portal is illustrated. Arrows indicate data flow from internal repository to public interface. Infrastructural components are shown in upper half; publicly available information is shown in lower half. External links available from the portal are listed in the lower left.
Figure 5
Figure 5
Density plots of target sequence recovery from our raw data. Data are presented prior to any filtering, illustrating relationships of sum of gene lengths (bp) to (a) the number of mapped reads and (b) the number of recovered genes. Darker shades indicate greater density of data points. Black (upper and righthand) dotted lines indicate medians of variables and red (lower and righthand) dotted lines indicate the threshold used to remove samples from downstream analyses, set as 20formula image of the median value across all samples.

References

    1. Abadi S., Azouri D., Pupko T., Mayrose I.. 2019. Model selection may not be a mandatory step for phylogeny reconstruction. Nat. Commun. 10:934. - PMC - PubMed
    1. Alsos I.G., Lavergne S., Merkel M.K., Boleda M., Lammers Y., Alberti A., Pouchon C., Denoeud F., Pitelkova I., Puşcaş M., Roquet C., Hurdu B.-I., Thuiller W., Zimmermann N.E., Hollingsworth P.M., Coissac E.. 2020. The treasure vault can be opened: Large-scale genome skimming works well using herbarium and silica gel dried material. Plants. 9:432. - PMC - PubMed
    1. Antonelli A., Fry C., Smith R.J., Simmonds M.S.J., Kersey P.J., Pritchard H.W., Abbo M.S., Acedo C., Adams J., Ainsworth A.M., Allkin B., Annecke W., Bachman S.P., Bacon K., Bárrios S., Barstow C., Battison A., Bell E., Bensusan K., Bidartondo M.I., Blackhall-Miles R.J., Borrell J.S., Brearley F.Q., Breman E., Brewer R.F.A., Brodie J., Cámara-Leret R., Campostrini Forzza R., Cannon P., Carine M., Carretero J., Cavagnaro T.R., Cazar M.E., Chapman T., Cheek M., Clubbe C., Cockel C., Collemare J., Cooper A., Copeland A.I., Corcoran M., Couch C., Cowell C., Crous P., da Silva M., Dalle G., Das D., David J.C., Davies L., Davies N., De Canha M.N., de Lirio E.J., Demissew S., Diazgranados M., Dickie J., Dines T., Douglas B., Dröge G., Dulloo M.E., Fang R., Farlow A., Farrar K., Fay M.F., Felix J., Forest F., Forrest L.L., Fulcher T., Gafforov Y., Gardiner L.M., Gâteblé G., Gaya E., Geslin B., Gonçalves S.C., Gore C.J.N., Govaerts R., Gowda B., Grace O.M., Grall A., Haelewaters D., Halley J.M., Hamilton M.A., Hazra A., Heller T., Hollingsworth P.M., Holstein N., Howes M.J.R., Hughes M., Hunter D., Hutchinson N., Hyde K., Iganci J., Jones M., Kelly L.J., Kirk P., Koch H., Grisai-Greilhuber I., Lall N., Langat M.K., Leaman D.J., Leão T.C., Lee M.A., Leitch I.J., Leon C., Lettice E., Lewis G.P., Li L., Lindon H., Liu J.S., Liu U., Llewellyn T., Looney B., Lovett J.C., Luczaj L., Lulekal E., Maggassouba S., Malécot V., Martin C., Masera O.R., Mattana E., Maxted N., Mba C., McGinn K.J., Metheringham C., Miles S., Miller J., Milliken W., Moat J., Moore P.G.P., Morim M.P., Mueller G.M., Muminjanov H., Negrão R., Nic Lughadha E., Nicholson N., Niskanen T., Nono Womdim R., Noorani A., Obreza M., O’Donnell K., O’Hanlon R., Onana J.M., Ondo I., Padulosi S., Paton A., Pearce T., Pérez Escobar O.A., Pieroni A., Pironon S., Prescott T.A.K., Qi Y.D., Qin H., Quave C.L., Rajaovelona L., Razanajatovo H., Reich P.B., Rianawati E., Rich T.C.G., Richards S.L., Rivers M.C., Ross A., Rumsey F., Ryan M., Ryan P., Sagala S., Sanchez M.D., Sharrock S., Shrestha K.K., Sim J., Sirakaya A., Sjöman H., Smidt E.C., Smith D., Smith P., Smith S.R., Sofo A., Spence N., Stanworth A., Stara K., Stevenson P.C., Stroh P., Suz L.M., Tambam B.B., Tatsis E.C., Taylor I., Thiers B., Thormann I., Vaglica V., Vásquez-Londoño C., Victor J., Viruel J., Walker B.E., Walker K., Walsh A., Way M., Wilbraham J., Wilkin P., Wilkinson T., Williams C., Winterton D., Wong K.M., Woodfield-Pascoe N., Woodman J., Wyatt L., Wynberg R., Zhang B.G. 2020. State of the world’s plants and fungi 2020. Kew: Royal Botanic Gardens.
    1. APG. 1998. An ordinal classification for the families of flowering plants. Ann. Missouri Bot. Gard. 85:531–553.
    1. APG II. 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: Apg II. Bot. J. Linn. Soc. 141:399–436.

Publication types