Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(5):e1003073.
doi: 10.1371/journal.pcbi.1003073. Epub 2013 May 16.

Human monogenic disease genes have frequently functionally redundant paralogs

Affiliations

Human monogenic disease genes have frequently functionally redundant paralogs

Wei-Hua Chen et al. PLoS Comput Biol. 2013.

Abstract

Mendelian disorders are often caused by mutations in genes that are not lethal but induce functional distortions leading to diseases. Here we study the extent of gene duplicates that might compensate genes causing monogenic diseases. We provide evidence for pervasive functional redundancy of human monogenic disease genes (MDs) by duplicates by manifesting 1) genes involved in human genetic disorders are enriched in duplicates and 2) duplicated disease genes tend to have higher functional similarities with their closest paralogs in contrast to duplicated non-disease genes of similar age. We propose that functional compensation by duplication of genes masks the phenotypic effects of deleterious mutations and reduces the probability of purging the defective genes from the human population; this functional compensation could be further enhanced by higher purification selection between disease genes and their duplicates as well as their orthologous counterpart compared to non-disease genes. However, due to the intrinsic expression stochasticity among individuals, the deleterious mutations could still be present as genetic diseases in some subpopulations where the duplicate copies are expressed at low abundances. Consequently the defective genes are linked to genetic disorders while they continue propagating within the population. Our results provide insight into the molecular basis underlying the spreading of duplicated disease genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Duplicated genes are enriched in monogenic disease genes.
A) percentages of duplicates in monogenic disease genes (MD) and non-disease genes (ND). B) percentages of monogenic disease genes as function of number of duplicates in human; 0 indicates that genes are singletons. Here duplicates were defined using TreeFam. P-value shown in panel A was calculated using Fisher's Exact Test; level of significance: *** <0.001, ** <0.01, * <0.05. Numbers shown within the bars are gene counts (subset/total).
Figure 2
Figure 2. Evidence for functional redundancy in duplicated disease genes.
Comparing with duplicated non-disease genes (ND) of similar duplication age (represented by branch length, see Methods), monogenic disease genes (MD) tend have A) higher co-expression co-efficient (p-value = 1.69×10−3, Hypergeometric Distribution test), C) higher sequence similarity (p-value = 1.66×10−3, Hypergeometric Distribution test). Results in A) can be repeated using another set of gene expression data (Figure S3). P-values shown in the boxplots (B and D) were calculated using two-sample Wilcoxon Rank Sum Test; see Materials and Methods for more details regarding the statistical tests. Numbers shown next the boxplots are the numbers of valid samples (after removing samples with missing values).
Figure 3
Figure 3. Evidence for pervasive functional redundancy in duplicated disease genes based on Gene Ontology annotations.
Compared with duplicated non-disease genes (ND) of similar duplication age (represented by branch length, see Methods), monogenic disease genes (MD) tend to have A) higher functional similarity according to Gene Ontology annotations with their most recent duplications (MRDs; p-value = 7.77×10−5, Hypergeometric Distribution test); B) the same are also true when duplication ages are omitted (Wilcoxon Rank Sum Test).
Figure 4
Figure 4. Higher purifying selections on duplicated disease genes.
Compared with non-disease genes (NDs), disease genes tend to have lower dN values with their mouse- (A) and Macaca- (B) one-to-one orthologs. Furthermore, compared with disease singletons (singlet genes or singletons refer to those that do not share significant protein sequence similarities with other human genes), duplicated disease genes tend to have lower dN values with their mouse- (C) and Macaca- (D) orthologs. The higher selective constraints on duplicated disease genes can be also seen within the human genome; for example, compared with duplicated non-disease genes (ND) of similar duplication age, disease genes tend to have lower dN values with their closest paralogs within human (E; p-value = 4×10−7, Hypergeometric Distribution test). However the same isn't true when age is omitted (F), highlighting the importance of dividing gene pairs according to their duplication age. P-values shown in the boxplots (A∼D and F) were calculated using two-sample Wilcoxon Rank Sum Test. A similar plot showing no outliers is also available in Figure S6.
Figure 5
Figure 5. A model for the effect of functional compensation on the propagation of duplicated disease genes in the human population.
This model is based on two previous experimental studies. The first showed that genes with identical promoters could have very different expression abundances in individual E. coli cells . The second showed different C. elegans individuals carrying the defect gene could demonstrate varying phenotypes ranging from wild type to stalled development on embryogenesis, depending on the expression abundance of a duplicate gene . We therefore propose that in cases where a duplicate (A1_human) exists (panel A), the functional impairment caused by mutations on a disease gene (A2_human) could be compensated; however due to intrinsic expression stochasticity of the duplicate copy, some individuals would appear to be normal while some others show reduced fitness (panel B). Consequently this gene A2 is linked to genetic disorders while the deleterious mutations it carries continue to spread instead of being removed in the human population. On the other hand, if a disease gene (B_human; panel C) is a singlet without any paralogs, its mutations then would be more likely to be purged from the population (panel D) since compensation by non-duplicates via genetic interactions is relatively rare , .

Comment in

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921. - PubMed
    1. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291: 1304–1351. - PubMed
    1. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, et al. (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12: 745–755. - PubMed
    1. Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, et al. (2010) Genome-wide association studies in diverse populations. Nat Rev Genet 11: 356–366. - PMC - PubMed
    1. Ott J, Kamatani Y, Lathrop M (2011) Family-based designs for genome-wide association studies. Nat Rev Genet 12: 465–474. - PubMed

Publication types