Bioinformatical assay of human gene morbidity

Fyodor A Kondrashov¹, Aleksey Y Ogurtsov, Alexey S Kondrashov

Affiliations

PMID: 15020709
PMCID: PMC390328
DOI: 10.1093/nar/gkh330

Bioinformatical assay of human gene morbidity

Fyodor A Kondrashov et al. Nucleic Acids Res. 2004.

. 2004 Mar 12;32(5):1731-7.

doi: 10.1093/nar/gkh330. Print 2004.

Authors

Fyodor A Kondrashov¹, Aleksey Y Ogurtsov, Alexey S Kondrashov

Affiliation

¹ National Center for Biotechnology Information, National Institutes of Health, 38a Center Drive, 6S602, Bethesda, MD 20892, USA. kondrashov@ncbi.nlm.nih.gov

PMID: 15020709
PMCID: PMC390328
DOI: 10.1093/nar/gkh330

Abstract

Only a fraction of eukaryotic genes affect the phenotype drastically. We compared 18 parameters in 1273 human morbid genes, known to cause diseases, and in the remaining 16 580 unambiguous human genes. Morbid genes evolve more slowly, have wider phylogenetic distributions, are more similar to essential genes of Drosophila melanogaster, code for longer proteins containing more alanine and glycine and less histidine, lysine and methionine, possess larger numbers of longer introns with more accurate splicing signals and have higher and broader expressions. These differences make it possible to classify as non-morbid 34% of human genes with unknown morbidity, when only 5% of known morbid genes are incorrectly classified as non-morbid. This classification can help to identify disease-causing genes among multiple candidates.

PubMed Disclaimer

Figures

**Figure 1**
Distributions of 18 parameters within 1273 known human morbid genes (black bars) and within all other 16 580 unambiguous human genes (gray bars). (A and B) K_n and K_s between a human gene and its murine ortholog. (C–E) The fraction of identical amino acids within the alignment of the protein coded by a human gene and the most similar protein of *D.melanogaster* (C), *C.elegans* (D) and *A.thaliana* (E). (F) The proportions of genes for which the most similar gene in the *D.melanogaster* genome is essential or non-essential. (G) The number of paralogs of a gene within the human genome. (H) The length of the protein encoded by a gene. (I) The number of introns within a gene. (J) The average length of introns within a gene. (K) The average quality of 5′ and 3′ splicing signals within introns of a gene. (L and M) The expression level or breath of a gene. (N–R) The proportions of alanine (N), glycine (O), histidine (P), lysine (Q) and methionine (R) within the protein encoded by a gene.

**Figure 2**
The process of training the neural network. At each step, ∼15 changes of the network weights occur. (A) The fractions of generic genes within the training and test sets for which the classification variable X, generated by the output neuron of the network, is above 0.5. (B) The fractions of morbid genes within the training and test sets for which the classification variable X is below 0.5.

**Figure 3**
Cumulative distributions of the classification variable X, generated by the trained neural network, within the validation set of morbid and generic genes. Cut-off values of X at which 1, 5 and 10% of known morbid genes are incorrectly classified as non-morbid are shown.

See this image and copyright information in PMC

References

1. Thatcher J.W., Shaw,J.M. and Dickinson,W.J. (1998) Marginal fitness contributions of nonessential genes in yeast. Proc. Natl Acad. Sci. USA, 95, 253–257. - PMC - PubMed
1. Steinmetz L.M., Scharfe,C., Deutschbauer,A.M., Mokranjac,D., Herman,Z.S., Jones,T., Chu,A.M., Giaever,G., Prokisch,H., Oefner,P.J. and Davis,R.W. (2002) Systematic screen for human disease genes in yeast. Nature Genet., 31, 400–404. - PubMed
1. Gu Z.L., Steinmetz,L.M., Gu,X., Scharfe,C., Davis,R.W. and Li,W.-H. (2003) Role of duplicate genes in genetic robustness against null mutations. Nature, 421, 63–66. - PubMed
1. Simmer F., Moorman,C., van der Linden,A.M., Kuijk,E., van den Berghe,P.V.E., Kamath,R.S., Fraser,A.G., Ahringer,J. and Plasterk,R.H.A. (2003) Genome-wide RNAi of C. elegans using the hypersensitive rrf-3 strain reveals novel gene functions. PLoS Biol., 1, 77–84. - PMC - PubMed
1. Stewart H.I., O’Neil,N.J., Janke,D.L., Franz,N.W., Chamberlin,H.M., Howell,A.M., Gilchrist,E.J., Ha,T.T., Kuervers,L.M., Vatcher,G.P., Danielson,J.L. and Baillie,D.L. (1998) Lethal mutations defining 112 complementation groups in a 4.5 Mb sequenced region of Caenorhabditis elegans chromosome III. Mol. Gen. Genet., 260, 280–288. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bioinformatical assay of human gene morbidity

Affiliation

Bioinformatical assay of human gene morbidity

Authors

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources