Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 17;117(11):5977-5986.
doi: 10.1073/pnas.1916786117. Epub 2020 Mar 2.

Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals

Affiliations

Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals

Greg Slodkowicz et al. Proc Natl Acad Sci U S A. .

Abstract

Understanding the molecular basis of adaptation to the environment is a central question in evolutionary biology, yet linking detected signatures of positive selection to molecular mechanisms remains challenging. Here we demonstrate that combining sequence-based phylogenetic methods with structural information assists in making such mechanistic interpretations on a genomic scale. Our integrative analysis shows that positively selected sites tend to colocalize on protein structures and that positively selected clusters are found in functionally important regions of proteins, indicating that positive selection can contravene the well-known principle of evolutionary conservation of functionally important regions. This unexpected finding, along with our discovery that positive selection acts on structural clusters, opens previously unexplored strategies for the development of better models of protein evolution. Remarkably, proteins where we detect the strongest evidence of clustering belong to just two functional groups: Components of immune response and metabolic enzymes. This gives a coherent picture of pathogens and xenobiotics as important drivers of adaptive evolution of mammals.

Keywords: adaptive evolution; immunity; mammals; metabolism; protein evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Positively selected residues tend to cluster together. (A) Overview of the approach. (B) Distribution of values of selective constraint in the dataset. With 97.6% of sites having ω < 1 (indicating purifying selection), and 2.4% with ω ≥ 1, the mean of ω across the entire dataset is 0.126. (C) QQ plot of P value distribution obtained from CLUMPS applied to positively selected sites at FDR of 0.05, 0.1, 0.2, and 0.5. If the residues under positive selection were randomly distributed on protein structures, we would expect a uniform distribution of P values (gray line). The observed P values for positively selected sites are lower than would be expected under the null hypothesis of random placement, indicating that positively selected sites tend to cluster together. In contrast, near-neutrally evolving sites (gray points) do not show a tendency to cluster.
Fig. 2.
Fig. 2.
Clusters of positively selected sites in serpin B3. (A) Placement of positively selected sites on the structure of serpin B3 (PDB ID code 4ZK0). (B) Mode of action of serpins shown using PDB structures 1K9O (Upper) and 1EZX (Lower) with the substrate shown in black and the reactive center loop marked in blue. Regions analogous to those where positively selected clusters were detected are marked as in A. Serpins function by binding their target proteases using a reactive center loop that mimics the protease substrate. They then form a covalent bond with the protease and undergo a large conformational change resulting in the protease being deformed and then acylated (40, 41). We find that positively selected residues surround the reactive center loop and are also located on the opposite side of the protein to which the bound protease is dragged.
Fig. 3.
Fig. 3.
Positively selected residues in CYPs cluster in the substrate entry channel and catalytic site. Positively selected residues: (A and B) CYP3A4 (PDB ID code 3TJS), (C and D) CYP2C9 (PDB ID code 1R9O), and (E and F) CYP2D6 (PDB ID code 2F9Q). Hemes are shown colored in dark gray, other ligands in blue. Additional ligands were transferred from other PDB structures by superimposition: (A and B) desthiazolylmethyloxycarbonyl ritonavir, ketoconazole (PDB ID code 2V0M), erythromycin (PDB ID code 2J0D), (C and D) flurbiprofen, (E and F) prinomastat (PDB ID code 3QM4). Specificity for the extraordinary diversity of substrates in this enzyme superfamily is facilitated by a large, flexible binding pocket at the bottom of which heme is located. In all three structures, the location of the positively selected residues tracks the binding of a ligand, and in general can be found on the sides of helices and in loops that form the binding pocket.
Fig. 4.
Fig. 4.
Positively selected residues in AKRs surround the substrate binding site. Positively selected residues in (A) AKR1B10 (PDB ID code 1ZUA) and (B) AKR1C4 (PDB ID code 2FVL). Tolrestat marked in blue, NADP+ marked in dark gray. Positively selected residues in AKR1B10 cluster around the bound ligand tolrestat, an inhibitor developed for diabetes treatment, but not around the NADP+ cofactor. The structure of AKR1C4 has been solved without ligand but the positively selected residues cluster in a similar region of the structure when compared to AKR1B10. As in the case of AKR1B10, there are no positively selected residues in the neighborhood of the NADP+ cofactor.
Fig. 5.
Fig. 5.
Positively selected residues in other enzymes. (A) Positively selected sites in GSTA3 (PDB ID code 1TDI). Glutathione shown in dark gray, δ-4-androstene-3-17-dione (blue) transferred by structure superimposition from structure 2VCV. (B) Positively selected residues in sulfotransferase 2A1 (PDB ID code 3F3Y). Adenosine-3′-5′ diphosphate shown in dark gray, lithocholic acid shown in blue. (C) Positively selected sites in carboxylesterase 1 (PDB ID code 1MX1). Tacrine shown in blue. (D) Positively selected residues in AK5 (PDB ID code 2BWJ). (E) Positively selected sites in OLAH (PDB ID code 4XJV).
Fig. 6.
Fig. 6.
Properties of positively selected sites. (A) Distance of positively selected residues from bound exogenous ligands. (B) The distribution of ω as a function of distance from catalytic residues. (C) Departures from the background amino acid frequencies in positively selected residues. (D) Distribution of fraction of gene duplications in proteins with positively selected clusters.

References

    1. Havrilla J. M., Pedersen B. S., Layer R. M., Quinlan A. R., A map of constrained coding regions in the human genome. Nat. Genet. 51, 88–95 (2019). - PMC - PubMed
    1. Fuller Z. L., Berg J. J., Mostafavi H., Sella G., Przeworski M., Measuring intolerance to mutation in human genetics. Nat. Genet. 51, 772–776 (2019). - PMC - PubMed
    1. Yang Z., PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007). - PubMed
    1. Weaver S., et al. , Datamonkey 2.0: A modern web application for characterizing selective and other evolutionary processes. Mol. Biol. Evol. 35, 773–777 (2018). - PMC - PubMed
    1. Benner S. A., Natural progression. Nature 409, 459 (2001). - PubMed

LinkOut - more resources