Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 17;15(12):e0237682.
doi: 10.1371/journal.pone.0237682. eCollection 2020.

Mutational survivorship bias: The case of PNKP

Affiliations

Mutational survivorship bias: The case of PNKP

Luis Bermúdez-Guzmán et al. PLoS One. .

Abstract

The molecular function of a protein relies on its structure. Understanding how variants alter structure and function in multidomain proteins is key to elucidate the generation of a pathological phenotype. However, one may fall into the logical bias of assessing protein damage only based on the variants that are visible (survivorship bias), which can lead to partial conclusions. This is the case of PNKP, an important nuclear and mitochondrial DNA repair enzyme with both kinase and phosphatase function. Most variants in PNKP are confined to the kinase domain, leading to a pathological spectrum of three apparently distinct clinical entities. Since proteins and domains may have a different tolerability to variation, we evaluated whether variants in PNKP are under survivorship bias. Here, we provide the evidence that supports a higher tolerance in the kinase domain even when all variants reported are deleterious. Instead, the phosphatase domain is less tolerant due to its lower variant rates, a higher degree of sequence conservation, lower dN/dS ratios, and the presence of more disease-propensity hotspots. Together, our results support previous experimental evidence that demonstrated that the phosphatase domain is functionally more necessary and relevant for DNA repair, especially in the context of the development of the central nervous system. Finally, we propose the term "Wald's domain" for future studies analyzing the possible survivorship bias in multidomain proteins.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Depiction of Wald’s idea regarding survivorship bias and the overlooking of the events that did not succeed in a selection process.
The red dots represent hypothetical areas where an airplane could be damaged and still return.
Fig 2
Fig 2. Phylogenetic tree of the seventy species used in the multiple sequence alignment.
Species are colored according to their taxonomic group. The gray circular scale represents the percentage identity with respect to the human protein sequence of PNKP.
Fig 3
Fig 3. Percentage identity analysis of SSBR proteins.
A) Comparison of the difference in the percentage identity between the human sequence of each SSBR protein and several species. The higher the bar, the more different is the sequence of each organism with respect to the human. B) Comparison of different SSBR proteins according to the variability in the sequence percentage identity between species (* p<0.05; ** p<0,01; *** p<0,001; ns: not significant).
Fig 4
Fig 4. Conservation analysis of the phosphatase and kinase domains.
A) Color code according to the degree of sequence conservation for each domain. Conservation scores are calculated based on the previous MSA. B) Number of amino acids classified according to their conservation scores for each domain. C) Prediction of the 3D structure of human PNKP colored according to the degree of conservation for each domain. The linker between the FHA and phosphatase domain shows low conservation scores, while the degree of greater conservation is in the phosphatase domain.
Fig 5
Fig 5. Disease-propensity plots of PNKP.
A) Distribution of the average Disease-Propensity scores for the amino acids within the phosphatase and kinase domain of PNKP. Notice that there is a higher number of residues showing DPS > 75 in the phosphatase domain. B) The differences between average DPS is more evident when only DPS > 50 are considered. C) DPS for the entire protein, showing several hotspots in the phosphatase domain. All regions with an average beyond the critical value of 75 are colored in red, but only consecutive positions with averages over the critical value are considered disease-propensity hotspots.
Fig 6
Fig 6. Structure of human PNKP showing the most important ligand-binding sites predicted in silico.
A) Pocket showing the ATP/ADP binding site in the kinase domain. B) DNA binding site in the phosphatase domain. C) Phosphatase active site. Most predictions coincide with previously experimental results. Figure was elaborated with PyMol 2.3.
Fig 7
Fig 7. Missense3D prediction of structural changes generated by Single Amino acid Variants reported in patients with PNKP-associated diseases.
Wild-Type residues are represented in blue while the mutant amino acids and the nearby residues affected by them are represented in red.
Fig 8
Fig 8. Analysis of the dN/dS ratio between the phosphatase and kinase domain.
A) Comparison of the distributions of dN/dS values for each domain showing higher density of low dN/dS values for the phosphatase domain (D = 0.191, p-value = 0.002). B) Comparison of the mean dN/dS values between domains, showing a higher tolerability in the kinase domain (W = 21063, p-value = 0.0004).
Fig 9
Fig 9. Mutational tolerance landscape in the phosphatase and kinase domain.
The gray shaded areas represent the dN/dS values for each position and the corresponding mutational tolerability (the lower the ratio, the lower the tolerance). Bars represent DPS (0–100), so that "valleys" are regions with the lowest dN/dS ratios (highly intolerant to variation) and the highest DPS.
Fig 10
Fig 10. Depiction of PNKP as one of the airplanes from the Second World War from Wald’s analysis.
The regions mutated within the kinase domain are the sites where the protein can be damaged and still generate a viable phenotype. In contrast, only three variants have been reported in the phosphatase domain and patients present the most severe phenotype: MCSZ. Therefore, the phosphatase domain represents the areas that when “attacked, would cause the plane to be lost”.

References

    1. Liu J, Rost B. CHOP proteins into structural domain-like fragments. Proteins. 2004;55: 678–688. 10.1002/prot.20095 - DOI - PubMed
    1. Bornberg-Bauer E, Beaussart F, Kummerfeld SK, Teichmann SA, Weiner J. The evolution of domain arrangements in proteins and interaction networks. CMLS, Cell Mol Life Sci. 2005;62: 435–445. 10.1007/s00018-004-4416-1 - DOI - PMC - PubMed
    1. Yates CM, Sternberg MJE. Proteins and Domains Vary in Their Tolerance of Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs). Journal of Molecular Biology. 2013;425: 1274–1286. 10.1016/j.jmb.2013.01.026 - DOI - PubMed
    1. Petukh M, Kucukkal TG, Alexov E. On Human Disease-Causing Amino Acid Variants: Statistical Study of Sequence and Structural Patterns. Human Mutation. 2015;36: 524–534. 10.1002/humu.22770 - DOI - PMC - PubMed
    1. Kucukkal TG, Petukh M, Li L, Alexov E. Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. Current Opinion in Structural Biology. 2015;32: 18–24. 10.1016/j.sbi.2015.01.003 - DOI - PMC - PubMed

Publication types

Substances