Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 28;19(1):208.
doi: 10.1186/s13059-018-1590-2.

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Affiliations

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Mihaela Pertea et al. Genome Biol. .

Abstract

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .

Keywords: GTEx; Human gene count; RNA sequencing; Transcriptome; Transcriptome assembly.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
One of 224 new protein-coding genes (CHS.7402) discovered in this study. This 4-exon gene occurs on the forward strand of chromosome 10 at the coordinates shown. The exon lengths are 134, 30, 136, and 663 bp (left to right), with the narrower rectangles indicating the 5′ and 3′ UTR regions. The intron lengths (not shown to scale) are 18,098, 1086, and 1956 bp. The sequence alignment at the bottom shows, top to bottom, the protein sequences from CHS.7402, long-tailed macaque, rhesus macaque, marmoset, white-faced capuchin, ass, Przewalski’s horse, white rhinoceros, and wild boar. The full-length human protein sequence is shown
Fig. 2
Fig. 2
The number of a introns and b transcripts shared by and unique to all combinations of the CHESS (v2.1), RefSeq (rel 108), and GENCODE databases (v28). For this comparison, only transcripts and introns assembled directly by the CHESS pipeline were included. The CHESS database also includes additional transcripts that were added directly from RefSeq and GENCODE (see main text)
Fig. 3
Fig. 3
a The number of novel protein-coding and lncRNA genes that were differentially expressed between males and females, for each of the GTEx tissues that had both male and female samples. All tissues except kidney had at least 10 samples for each sex; kidney had 9 female and 29 male. b The number of novel protein-coding and lncRNA genes in CHESS that were upregulated in each of the 31 GTEx tissues as compared to the remaining tissues
Fig. 4
Fig. 4
Multiple sequence alignments of novel CHESS protein-coding genes CHS.57705 (a) and CHS.24083 (b), each compared to five other primates, with annotated MS/MS spectra validating the identified peptides IDISFHR (a) and QLLTGAR (b) as shown on the right
Fig. 5
Fig. 5
Summary of the computational pipeline used to align and assemble all 9795 RNA-seq samples

Comment in

Similar articles

Cited by

References

    1. Vogel F. A preliminary estimate of the number of human genes. Nature. 1964;201:847. doi: 10.1038/201847a0. - DOI - PubMed
    1. Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriguez-Tome P, Aggarwal A, Bajorek E, et al. A gene map of the human genome. Science. 1996;274:540–546. doi: 10.1126/science.274.5287.540. - DOI - PubMed
    1. Antequera F, Bird A. Predicting the total number of human genes. Nat Genet. 1994;8:114. doi: 10.1038/ng1094-114a. - DOI - PubMed
    1. Fields C, Adams MD, White O, Venter JC. How many genes in the human genome? Nat Genet. 1994;7:345–346. doi: 10.1038/ng0794-345. - DOI - PubMed
    1. Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, Quackenbush J. Correction: gene index analysis of the human genome estimates approximately 120,000 genes. Nat Genet. 2000;26:501. - PubMed

Publication types