Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Jan;13(1):27-36.
doi: 10.1101/gr.695703.

Reevaluating human gene annotation: a second-generation analysis of chromosome 22

Affiliations
Comparative Study

Reevaluating human gene annotation: a second-generation analysis of chromosome 22

John E Collins et al. Genome Res. 2003 Jan.

Abstract

We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on an evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Demonstration of a partial gene duplication using DOTTER analysis. Part of the adaptin gene (AP1B1) has been duplicated and rearranged 2.8 Mb telomeric on chromosome 22, to leave a structure in which the duplicated exon 6 (ADTB1L1) is followed 3′ by the duplicated exon 1 (ADTB1L2) and there is conservation of the surrounding intron sequences. Genomic sequence accession number and portion of the sequence used is indicated (note has been reversed). Gene structures are shown by a black line (introns) and hashed boxes (exons), and an arrow pointing toward the 3′ end of the gene. Within the dot matrix, diagonal lines indicate regions of sequence identity and dashed lines show where exon sequences align.
Figure 2.
Figure 2.
Plot of sensitivity versus specificity for each data source. Sensitivity and specificity were calculated over the full annotation set as described in Methods; see Table 2. Each data source is indicated by a different color and symbol.

References

    1. Bailey J.A., Yavor, A.M., Viggiano, L., Misceo, D., Horvath, J.E., Archidiacono, N., Schwartz, S., Rocchi, M., and Eichler, E.E. 2002. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22. Am. J. Hum. Genet. 70: 83-100. - PMC - PubMed
    1. Bairoch A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 45-48. - PMC - PubMed
    1. Boguski M.S. and Schuler, G.D. 1995. ESTablishing a human transcript map. Nat. Genet. 10: 369-371. - PubMed
    1. Burge C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94. - PubMed
    1. Burset M. and Guigo, R. 1996. Evaluation of gene structure prediction programs. Genomics 34: 353-367. - PubMed

Publication types

Associated data