Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Mar 12;32(5):1731-7.
doi: 10.1093/nar/gkh330. Print 2004.

Bioinformatical assay of human gene morbidity

Affiliations

Bioinformatical assay of human gene morbidity

Fyodor A Kondrashov et al. Nucleic Acids Res. .

Abstract

Only a fraction of eukaryotic genes affect the phenotype drastically. We compared 18 parameters in 1273 human morbid genes, known to cause diseases, and in the remaining 16 580 unambiguous human genes. Morbid genes evolve more slowly, have wider phylogenetic distributions, are more similar to essential genes of Drosophila melanogaster, code for longer proteins containing more alanine and glycine and less histidine, lysine and methionine, possess larger numbers of longer introns with more accurate splicing signals and have higher and broader expressions. These differences make it possible to classify as non-morbid 34% of human genes with unknown morbidity, when only 5% of known morbid genes are incorrectly classified as non-morbid. This classification can help to identify disease-causing genes among multiple candidates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of 18 parameters within 1273 known human morbid genes (black bars) and within all other 16 580 unambiguous human genes (gray bars). (A and B) Kn and Ks between a human gene and its murine ortholog. (CE) The fraction of identical amino acids within the alignment of the protein coded by a human gene and the most similar protein of D.melanogaster (C), C.elegans (D) and A.thaliana (E). (F) The proportions of genes for which the most similar gene in the D.melanogaster genome is essential or non-essential. (G) The number of paralogs of a gene within the human genome. (H) The length of the protein encoded by a gene. (I) The number of introns within a gene. (J) The average length of introns within a gene. (K) The average quality of 5′ and 3′ splicing signals within introns of a gene. (L and M) The expression level or breath of a gene. (NR) The proportions of alanine (N), glycine (O), histidine (P), lysine (Q) and methionine (R) within the protein encoded by a gene.
Figure 1
Figure 1
Distributions of 18 parameters within 1273 known human morbid genes (black bars) and within all other 16 580 unambiguous human genes (gray bars). (A and B) Kn and Ks between a human gene and its murine ortholog. (CE) The fraction of identical amino acids within the alignment of the protein coded by a human gene and the most similar protein of D.melanogaster (C), C.elegans (D) and A.thaliana (E). (F) The proportions of genes for which the most similar gene in the D.melanogaster genome is essential or non-essential. (G) The number of paralogs of a gene within the human genome. (H) The length of the protein encoded by a gene. (I) The number of introns within a gene. (J) The average length of introns within a gene. (K) The average quality of 5′ and 3′ splicing signals within introns of a gene. (L and M) The expression level or breath of a gene. (NR) The proportions of alanine (N), glycine (O), histidine (P), lysine (Q) and methionine (R) within the protein encoded by a gene.
Figure 2
Figure 2
The process of training the neural network. At each step, ∼15 changes of the network weights occur. (A) The fractions of generic genes within the training and test sets for which the classification variable X, generated by the output neuron of the network, is above 0.5. (B) The fractions of morbid genes within the training and test sets for which the classification variable X is below 0.5.
Figure 3
Figure 3
Cumulative distributions of the classification variable X, generated by the trained neural network, within the validation set of morbid and generic genes. Cut-off values of X at which 1, 5 and 10% of known morbid genes are incorrectly classified as non-morbid are shown.

References

    1. Thatcher J.W., Shaw,J.M. and Dickinson,W.J. (1998) Marginal fitness contributions of nonessential genes in yeast. Proc. Natl Acad. Sci. USA, 95, 253–257. - PMC - PubMed
    1. Steinmetz L.M., Scharfe,C., Deutschbauer,A.M., Mokranjac,D., Herman,Z.S., Jones,T., Chu,A.M., Giaever,G., Prokisch,H., Oefner,P.J. and Davis,R.W. (2002) Systematic screen for human disease genes in yeast. Nature Genet., 31, 400–404. - PubMed
    1. Gu Z.L., Steinmetz,L.M., Gu,X., Scharfe,C., Davis,R.W. and Li,W.-H. (2003) Role of duplicate genes in genetic robustness against null mutations. Nature, 421, 63–66. - PubMed
    1. Simmer F., Moorman,C., van der Linden,A.M., Kuijk,E., van den Berghe,P.V.E., Kamath,R.S., Fraser,A.G., Ahringer,J. and Plasterk,R.H.A. (2003) Genome-wide RNAi of C. elegans using the hypersensitive rrf-3 strain reveals novel gene functions. PLoS Biol., 1, 77–84. - PMC - PubMed
    1. Stewart H.I., O’Neil,N.J., Janke,D.L., Franz,N.W., Chamberlin,H.M., Howell,A.M., Gilchrist,E.J., Ha,T.T., Kuervers,L.M., Vatcher,G.P., Danielson,J.L. and Baillie,D.L. (1998) Lethal mutations defining 112 complementation groups in a 4.5 Mb sequenced region of Caenorhabditis elegans chromosome III. Mol. Gen. Genet., 260, 280–288. - PubMed