Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;19(4):479-89.
doi: 10.1261/rna.037473.112. Epub 2013 Feb 19.

Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses

Affiliations

Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses

Geng Chen et al. RNA. 2013 Apr.

Abstract

Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Human transcriptomes in RefSeq, Ensembl, AceView, EnsAce, and AceEns. (A) The number of human genes and transcripts in each transcriptome. The counts for RefSeq are its unique genes and transcripts in UCSC for which the duplicated genes were removed. The Ensembl human transcriptome is the sum of its protein-coding and non-protein-coding transcripts (release 67 of GRCh37, corresponding to GENCODE 12). AceView transcripts that contained unknown bases of “N” were not taken into account. (B) Chromosome distribution of AceView human genes located in the intergenic and intronic regions of the Ensembl annotation. (C) Chromosome distribution of Ensembl human genes within the intergenic and intronic regions of AceView annotation. (D) Categories of the 18,083 Ensembl transcripts located in the AceView intergenic and intronic regions.
FIGURE 2.
FIGURE 2.
RNA-seq read mapping and expression profile comparison among the five transcriptomes. (A) Short-read mapping rate for RefSeq, Ensembl, AceView, EnsAce, and AceEns using 58 RNA-seq data sets from more than 20 human tissues and cell lines (Supplemental Table 1). (B) Number of expressed genes detected in the five transcriptomes across all 58 data sets. (C) Number of expressed transcripts examined in each data set among the five transcriptomes.
FIGURE 3.
FIGURE 3.
Expression distribution of genes and transcripts in the five transcriptomes. (A) Boxplot comparison of gene expression in our sequenced testis tissue for each transcriptome. (B) Boxplot comparison of transcript expression in testis for each transcriptome. (C) Density curves of gene expression level in testis among five transcriptomes. (D) Density curves of transcript expression level in testis. (E) Expression-level histogram of the AceView genes located in the intergenic and intronic regions of Ensembl annotation in testis. (F) Expression-level histogram of Ensembl genes located in the intergenic and intronic regions of AceView human annotation in testis.
FIGURE 4.
FIGURE 4.
Comparison of differential expression calling among different transcriptomes using brain versus UHR samples. (A) Comparison of detected differentially expressed genes between Ensembl and EnsAce. (B) Comparison of detected differentially expressed genes between AceView and AceEns. (C) Comparison of detected differentially expressed transcripts between Ensembl and EnsAce. (D) Comparison of detected differentially expressed transcripts between AceView and AceEns.
FIGURE 5.
FIGURE 5.
Distribution of reported trait/disease-associated SNPs on human chromosomes. From outside to inside, the first circle represents the human chromosomes. The 6522 previously reported trait/disease-associated SNPs are distributed on the second circle (one line for one SNP); 3481 of these SNPs were previously reported within RefSeq intragenic regions (red lines), while the other 3041 were reported in intergenic regions of RefSeq (blue lines). Among these 3041 trait/disease-associated SNPs, 879 could be remapped within 676 AceView genes (green lines in the third circle), and 724 could be relocated in 532 Ensembl genes (purple lines in the fourth circle).

References

    1. Aparicio SAJR 2000. How to count … human genes. Nat Genet 25: 129–130 - PubMed
    1. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L 2008. Natural selection has driven population differentiation in modern humans. Nat Genet 40: 340–345 - PubMed
    1. Beane J, Vick J, Schembri F, Anderlind C, Gower A, Campbell J, Luo L, Zhang XH, Xiao J, Alekseyev YO, et al. 2011. Characterizing the impact of smoking and lung cancer on the airway transcriptome using RNA-Seq. Cancer Prev Res (Phila) 4: 803–817 - PMC - PubMed
    1. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL 2011. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927 - PMC - PubMed
    1. Chen G, Li R, Shi L, Qi J, Hu P, Luo J, Liu M, Shi T 2011a. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq. BMC Genomics 12: 590. - PMC - PubMed

Publication types

LinkOut - more resources