Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 17;13(5):e0197433.
doi: 10.1371/journal.pone.0197433. eCollection 2018.

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

Affiliations

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

Stephen A Smith et al. PLoS One. .

Abstract

Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Winnowing criteria used for sorting genes for use in divergence-time inference analyses.
The order presented here is arbitrary.
Fig 2
Fig 2. Gene tree properties for the BIR, CAR, and CARY datasets.
Left: relationship between root-to-tip variance and tree length for simulated (clock and ucln) and empirical (data) datasets. Each simulation condition consists of 100 simulated datasets. Contours represent densities, while grey dots represent raw empirical values. Right: tip-specific root-to-tip variance for empirical datasets. Here, 0 represents the mean root-to-tip across all genes and all lineages. Red dots indicate outgroup taxa.
Fig 3
Fig 3. Gene tree properties for the VIT, HYM, and MIL datasets.
See Fig 2 for a description of the plotting attributes.
Fig 4
Fig 4. A comparison of strict clock and UCLN estimates of node ages for the six curated empirical datasets.
Bars represent 95% HPD intervals and overlay the UCLN maximum clade credibility tree.
Fig 5
Fig 5. A comparison of strict clock and UCLN estimates of node ages for the simulated clock and ucln datasets.
Each simulation condition consists of three genes. Red and pink are scenarios where the generating and inference models are identical, while green and blue are where the models are mismatched. Bars represent 95% HPD intervals and overlay the true simulated tree.
Fig 6
Fig 6. Results from a simulation comparing total difference between true and estimated ages for three random genes under a clock (RC) and UCLN (RU) and three genes chosen with SortaDate under a clock (SC) and UCLN (SU).
Results are from 100 simulations of distinct parameter values. See details in the text.

References

    1. Smith SA, Beaulieu JM, Donoghue MJ. An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proceedings of the National Academy of Sciences. 2010;107(13):5897–5902. doi: 10.1073/pnas.1001225107 - DOI - PMC - PubMed
    1. Dornburg A, Brandley MC, McGowen MR, Near TJ. Relaxed Clocks and Inferences of Heterogeneous Patterns of Nucleotide Substitution and Divergence Time Estimates across Whales and Dolphins (Mammalia: Cetacea). Molecular Biology and Evolution. 2012;29(2):721–736. doi: 10.1093/molbev/msr228 - DOI - PubMed
    1. Parham JF, Donoghue PCJ, Bell CJ, Calway TD, Head JJ, Holroyd PA, et al. Best Practices for Justifying Fossil Calibrations. Systematic Biology. 2012;61(2):346–359. doi: 10.1093/sysbio/syr107 - DOI - PMC - PubMed
    1. Heath TA, Moore BR. Bayesian inference of species divergence times In: Chen MH, Kuo L, Lewis PO, editors. Bayesian Phylogenetics: Methods Algorithms, and Applications. Boca Raton, Florida: CRC Press; 2014. p. 277–318.
    1. Beaulieu JM, O’Meara BC, Crane P, Donoghue MJ. Heterogeneous Rates of Molecular Evolution and Diversification Could Explain the Triassic Age Estimate for Angiosperms. Systematic Biology. 2015;64(5):869–878. doi: 10.1093/sysbio/syv027 - DOI - PubMed

LinkOut - more resources