Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;31(9):e4408.
doi: 10.1002/pro.4408.

Compendium of proteins containing segments that exhibit zero-tolerance to amino acid variation in humans

Affiliations

Compendium of proteins containing segments that exhibit zero-tolerance to amino acid variation in humans

Adam L Sanders et al. Protein Sci. 2022 Sep.

Abstract

Genetic missense tolerance ratio (MTR) analysis systematically evaluates all possible segments in a given protein-encoding transcript found in the human population. This method scores each segment for the number of observed missense variants versus the number of silent mutations in that same segment. An MTR score of 0 indicates that no missense mutations are observed within a given segment. This is indicative of evolutionary purifying selection, which excludes mutations in that segment from the general human population. Here, we conducted MTR analysis on each of the roughly 20,000 protein-encoding human genes. It was seen that there are 257 genes with at least one 31-residue encoding segment with MTR = 0 (1.3% of all human genes). The proteins encoded by these 257 genes were tabulated along with information regarding the sequence location of each intolerant segment, the likely function of the protein, and so forth. The most functionally-enriched family among these proteins is a collection of several dozen proteins that are directly involved in RNA splicing. Some of the other proteins with zero-tolerance segments have thus far escaped significant characterization. Indeed, while a number of these proteins have previously been genetically linked to human disorders, many have not. We hypothesize that this compendium of human proteins with zero-tolerance segments can be used to complement disease mutation data as a pointer to genes and proteins that are associated with interesting and underexplored human biology.

Keywords: database; gene; genetic; genome; intolerance; intolerant; missense tolerance ratio; protein; proteome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIGURE 1
FIGURE 1
Histograms for intolerance within the 257 proteins containing a zero‐tolerance segment. (a) Distribution of MTR scores for all possible 31 residue segments. Segments with a score in the “at or near zero” bin represent 1.9% of all segments. The mean MTR score is 0.69 ± 0.26 and the median score is 0.73. (b) Distribution of median protein MTR scores based on analysis of all possible 31 amino acid segments within each protein. The mean of these medians is 0.71 ± 0.26. MTR, missense tolerance ratio
FIGURE 2
FIGURE 2
Representative examples of sequence identity patterns for proteins containing zero‐tolerance segments, comparing both the whole‐protein (black plots) and the intolerant segmental (red plots) homology levels to the 250 nearest mammalian homologs following BLASTP searches of NCBI. GENE.1, GENE.2, and so forth indicate which non‐contiguous intolerant segment for that gene was searched. The distributions of sequence identities seen for the 250 closest homologs to each protein are presented as box‐and‐whiskers plots. The bold bar is the median, the wings of the bars are the quartiles and the whiskers are 1.5 times the inner quartile ranges. The dots are outliers that lie beyond the whiskers. The complete results for all 257 proteins with zero‐tolerance segments are presented in Figure S1
FIGURE 3
FIGURE 3
Protein interaction network using Cytoscape stringApp based on an interactor cut‐off stringdb score ≥ 0.95. Not all proteins returned by this analysis (~150) are visualized here, as networks that consisted of two proteins were excluded from the visualization (with one exception). The clusters highlighted were manually assigned by identifying the general functions of proteins in the clustered area
FIGURE 4
FIGURE 4
Granulated protein interaction networks among proteins containing intolerant segments. We used a granularity parameter of 3 to form more discrete interaction nodes that may represent specific protein complexes. Proteins are labeled according to gene symbol. The darkness/thickness of the lines connecting nodes is indicative of Cytoscape stringApp experimental score, which is based on high‐throughput interaction mapping, where thicker darker lines reflect more confident interactions based on experiments. The networks shown are manually identified based of the general function of the cluster. Sub‐networks of six or fewer proteins are not shown
FIGURE 5
FIGURE 5
Panther overrepresentation GO biological process term analysis of biological pathways associated with proteins containing a zero‐tolerance segment. Only pathways with less than 500 proteins in the Homo sapiens reference list and p < 5 × 10−10 were considered. (a) The results for analysis of proteins in which the minimum size of the zero‐tolerance segment was 31 residues. (b) The results when the minimum length of the zero‐tolerant segment was 41 residues. GO, gene ontology

References

    1. Silk M, Petrovski S, Ascher DB. MTR‐Viewer: Identifying regions within genes under purifying selection. Nucleic Acids Res. 2019;47:W121–W126. - PMC - PubMed
    1. Traynelis J, Silk M, Wang Q, et al. Optimizing genomic medicine in epilepsy through a gene‐customized approach to missense variant interpretation. Genome Res. 2017;27:1715–1729. - PMC - PubMed
    1. Perszyk RE, Kristensen AS, Lyuboslavsky P, Traynelis SF. Three‐dimensional missense tolerance ratio analysis. Genome Res. 2021;31:1447–1461. - PMC - PubMed
    1. Fadista J, Oskolkov N, Hansson O, Groop L. LoFtool: A gene intolerance score based on loss‐of‐function variants in 60 706 individuals. Bioinformatics. 2017;33:471–474. - PubMed
    1. Havrilla JM, Pedersen BS, Layer RM, Quinlan AR. A map of constrained coding regions in the human genome. Nat Genet. 2019;51:88–95. - PMC - PubMed

Publication types

LinkOut - more resources