Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;33(8):2135-50.
doi: 10.1093/molbev/msw098. Epub 2016 May 24.

Controlling for Phylogenetic Relatedness and Evolutionary Rates Improves the Discovery of Associations Between Species' Phenotypic and Genomic Differences

Affiliations

Controlling for Phylogenetic Relatedness and Evolutionary Rates Improves the Discovery of Associations Between Species' Phenotypic and Genomic Differences

Xavier Prudent et al. Mol Biol Evol. 2016 Aug.

Abstract

The growing number of sequenced genomes allows us now to address a key question in genetics and evolutionary biology: which genomic changes underlie particular phenotypic changes between species? Previously, we developed a computational framework called Forward Genomics that associates phenotypic to genomic differences by focusing on phenotypes that are independently lost in different lineages. However, our previous implementation had three main limitations. Here, we present two new Forward Genomics methods that overcome these limitations by (1) directly controlling for phylogenetic relatedness, (2) controlling for differences in evolutionary rates, and (3) computing a statistical significance. We demonstrate on large-scale simulated data and on real data that both new methods substantially improve the sensitivity to detect associations between phenotypic and genomic differences. We applied these new methods to detect genomic differences involved in the loss of vision in the blind mole rat and the cape golden mole, two independent subterranean mammals. Forward Genomics identified several genes that are enriched in functions related to eye development and the perception of light, as well as genes involved in the circadian rhythm. These new Forward Genomics methods represent a significant advance in our ability to discover the genomic basis underlying phenotypic differences between species. Source code: https://github.com/hillerlab/ForwardGenomics/.

Keywords: evolutionary and comparative genomics; gene loss.; phenotype–genotype associations.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of three Forward Genomics methods. (A) Global %id values are computed by comparing the reconstructed sequence of the common ancestor of the species of interest (blue circle) to the sequence of an extant species. Local %id values are computed between the sequences at the start and end of each branch, which is either a reconstructed ancestral sequence (blue or green circle) or the sequence of an extant species. The branches in the phylogenetic tree are proportional to the number of substitutions per neutral site. Outgroup species are used to reconstruct the common ancestor. (B) The perfect-match method (Hiller et al. 2012b) assumes that the given phenotypic presence/absence (checkmark/cross) vector includes trait-losses in independent lineages and conducts a genome-wide search for genomic regions where all trait-loss species have a lower global %id value (higher sequence divergence) compared with all trait-preserving species. This is illustrated by a positive grey margin that separates the global %id values of both groups of species. (C) The GLS Forward Genomics method derives a covariance matrix that captures the phylogenetic relatedness between species. As illustrated for the first two species, the covariance between two species is the summed length of the branches that are shared between both species (highlighted in orange). The variance of a species is the summed length of all branches from the common ancestor to this species. Lower case letters indicate the length of the branches in the phylogenetic tree. A phylogenetic GLS approach (Grafen 1989) is used to compute a linear regression between the transformed normalized global %id values and the phenotypic pattern. The significance of a positive slope of the regression line is used as the significance of the association between phenotype and genotype. (D) The Branch method uses Dollo parsimony to estimate ancestral phenotypic states given the presence/absence pattern of the trait in the extant species. Each branch is then classified as trait-loss (red) or trait-preserving (blue). Local %id values are normalized by the expected value of a branch of the same length. If a genomic region is involved in the trait-loss, we expect that trait-loss branches are associated with lower normalized local %id values. The significance of a positive Pearson correlation coefficient is used as the significance for the association between phenotype and genotype.
Fig. 2.
Fig. 2.
Performance of the three Forward Genomics methods on simulated data shown by precision-sensitivity plots (right) for three of the 32 trait-loss scenarios (left). Trait losses occurred at the red crosses in the phylogeny, and following trait loss the 210 trait-involved genomic regions evolved neutrally along the parts of the branches shown in red. The red cross in the precision-sensitivity plot for the perfect-match method marks its performance when we consider only genomic regions where all trait-loss species have a lower global %id value compared with all trait-preserving species. The other 29 trait loss scenarios are shown in supplementary figures 5–33, Supplementary Material online.
Fig. 3.
Fig. 3.
Performance of the three Forward Genomics methods on 32 trait-loss scenarios. (A) The sensitivity at 90% precision is plotted for 32 different trait-loss scenarios. Consistently, the GLS and branch method improve the sensitivity compared with the perfect-match method. (B–F) Properties of the trait-loss scenarios and properties of the trait-involved genomic regions influence the performance: (B) Age of the trait loss, measured by how long the trait-involved elements evolved neutrally; (C) number of independent trait-losses; (D) evolutionary rate in the trait-loss lineages; (E) length of trait-involved elements; and (F) strength of selection on trait-involved elements in the branches where they evolve under selection. Weak, medium, or strong refers to genomic regions that accept mutations with an average probability of > 0.66, 0.33–0.66, <0.33, respectively.
Fig. 4.
Fig. 4.
Robustness of the new Forward Genomics methods to uncertainties in the phylogenetic tree. The sensitivity at a precision of 90% of all 32 trait loss scenarios is shown for (A) the Epitheria and the Exafroplacentalia tree topology (supplementary fig. 34A, Supplementary Material online) and (B) three trees where random noise was added to each branch length (supplementary fig. 34B, Supplementary Material online). Solid lines show the results using the phylogeny that was used to produce the simulated data (reproduced from fig. 3A for comparison). Please note that the perfect-match method considers neither topology nor branch lengths, thus always gives the same results. The number of scenarios where the achieved sensitivity is higher than the sensitivity of the perfect-match method is shown in the legend.
Fig. 5.
Fig. 5.
The GLS and branch method outperform the perfect-match method on the trait “loss of vitamin C synthesis”. (A) Gulo exons are ranked higher with the GLS and the branch method than with perfect-match. For GLS and the branch method, each conserved coding region was ranked by its P-value. For perfect-match, we used the size of the margin for ranking, which is the difference between the lowest %id value of a trait-preserving species and the highest %id value of a trait-loss species. Gulo exon 2 is ranked first for all three methods. (B) The significance of most Gulo exons increases if the megabat P. vampyrus is excluded from the list of trait loss species. The trait loss in P. vampyrus happened more recently than in Haplorrhini primates, guinea pig, and the microbat M. lucifugus. We computed the difference between the margin (perfect-match) and the log P-value (GLS and branch method) between the screen that used all nonvitamin C synthesizing species and the screen where P. vampyrus was excluded. Positive differences indicate a better match to the trait loss. The significance of Gulo exons 9 and 10 decreases because both exons are deleted in P. vampyrus. Gulo exon 1, which only encodes the start codon, is excluded.
Fig. 6.
Fig. 6.
The GLS and branch method detects several conserved coding regions that are diverged in two blind mammals, the blind mole rat, and the cape golden mole. Manhattan plots show the genomic location of 184,412 conserved coding regions and their associated P-values computed by the GLS (A) and branch method (B). All conserved coding regions that correspond to exons of the genes with a function in eye development and perception of light (supplementary table 2, Supplementary Material online) are shown in red.

Comment in

References

    1. Balemans MG, Pevet P, Legerstee WC, Nevo E. 1980. Preliminary investigations of melatonin and 5-methoxy-tryptophol synthesis in the pineal, retina, and harderian gland of the mole rat and in the pineal of the mouse “eyeless”. J Neural Transm. 49:247–255. - PubMed
    1. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14:708–715. - PMC - PubMed
    1. Brawand D, Wahli W, Kaessmann H. 2008. Loss of egg yolk genes in mammals and the origin of lactation and placentation. PLoS Biol. 6:e63.. - PMC - PubMed
    1. Caballero B, Tomas-Zapico C, Vega-Naredo I, Sierra V, Tolivia D, Hardeland R, Rodriguez-Colunga MJ, Joel A, Nevo E, Avivi A, et al. 2006. Antioxidant activity in Spalax ehrenbergi: a possible adaptation to underground stress. J Comp Physiol A Neuroethol Sens Neural Behav Physiol. 192:753–759. - PubMed
    1. Capra JA, Erwin GD, McKinsey G, Rubenstein JL, Pollard KS. 2013. Many human accelerated regions are developmental enhancers. Philos Trans R Soc Lond B Biol Sci. 368:20130025.. - PMC - PubMed

Publication types

LinkOut - more resources