Statistical features of human exons and their flanking regions
- PMID: 9536098
- DOI: 10.1093/hmg/7.5.919
Statistical features of human exons and their flanking regions
Abstract
To facilitate gene finding and for the investigation of human molecular genetics on a genome scale, we present a comprehensive survey on various statistical features of human exons. We first show that human exons with flanking genomic DNA sequences can be classified into 12 mutually exclusive categories. This classification could serve as a standard for future studies so that direct comparisons of results can be made. A database for eight categories (related to human genes in which coding regions are split by introns) was built from GenBank release 87.0 and analyzed by a number of methods to characterize statistical features of these sequences that may serve as controls or regulatory signals for gene expression. The statistical information compiled includes profiles of signals for transcription, splicing and translation, various compositional statistics and size distributions. Further analyses reveal novel correlations and constraints among different splicing features across an internal exon that are consistent with the Exon Definition model. This information is fundamental for a quantitative view of human gene organization, and should be invaluable for individual scientists to design human molecular genetics experiments.
Similar articles
-
Fission yeast gene structure and recognition.Nucleic Acids Res. 1994 May 11;22(9):1750-9. doi: 10.1093/nar/22.9.1750. Nucleic Acids Res. 1994. PMID: 8202381 Free PMC article.
-
A relationship between GC content and coding-sequence length.J Mol Evol. 1996 Sep;43(3):216-23. doi: 10.1007/BF02338829. J Mol Evol. 1996. PMID: 8703087
-
The 5' leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region.Mol Biol Evol. 2002 Sep;19(9):1613-23. doi: 10.1093/oxfordjournals.molbev.a004223. Mol Biol Evol. 2002. PMID: 12200488
-
Biased distribution of adenine and thymine in gene nucleotide sequences.J Mol Evol. 1994 Nov;39(5):439-47. doi: 10.1007/BF00173412. J Mol Evol. 1994. PMID: 7528807
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
Cited by
-
Lynch syndrome mutation spectrum in New South Wales, Australia, including 55 novel mutations.Mol Genet Genomic Med. 2016 Jan 11;4(2):223-31. doi: 10.1002/mgg3.198. eCollection 2016 Mar. Mol Genet Genomic Med. 2016. PMID: 27064304 Free PMC article.
-
Evidence that a threshold of serine/arginine-rich (SR) proteins recruits CFIm to promote rous sarcoma virus mRNA 3' end formation.Virology. 2016 Nov;498:181-191. doi: 10.1016/j.virol.2016.08.021. Epub 2016 Sep 4. Virology. 2016. PMID: 27596537 Free PMC article.
-
A biochemical analysis demonstrates that the BRCA1 intronic variant IVS10-2A--> C is a mutation.J Hum Genet. 2003;48(8):399-403. doi: 10.1007/s10038-003-0044-0. J Hum Genet. 2003. PMID: 14513821
-
Sequence information for the splicing of human pre-mRNA identified by support vector machine classification.Genome Res. 2003 Dec;13(12):2637-50. doi: 10.1101/gr.1679003. Genome Res. 2003. PMID: 14656968 Free PMC article.
-
Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions.Nucleic Acids Res. 2001 Jun 15;29(12):2581-93. doi: 10.1093/nar/29.12.2581. Nucleic Acids Res. 2001. PMID: 11410667 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases