Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 2;47(W1):W516-W522.
doi: 10.1093/nar/gkz400.

CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition

Affiliations

CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition

Jin-Cheng Guo et al. Nucleic Acids Res. .

Abstract

As more and more high-throughput data has been produced by next-generation sequencing, it is still a challenge to classify RNA transcripts into protein-coding or non-coding, especially for poorly annotated species. We upgraded our original coding potential calculator, CNCI (Coding-Non-Coding Index), to CNIT (Coding-Non-Coding Identifying Tool), which provides faster and more accurate evaluation of the coding ability of RNA transcripts. CNIT runs ∼200 times faster than CNCI and exhibits more accuracy compared with CNCI (0.98 versus 0.94 for human, 0.95 versus 0.93 for mouse, 0.93 versus 0.92 for zebrafish, 0.93 versus 0.92 for fruit fly, 0.92 versus 0.88 for worm, and 0.98 versus 0.85 for Arabidopsis transcripts). Moreover, the AUC values of 11 animal species and 27 plant species showed that CNIT was capable of obtaining relatively accurate identification results for almost all eukaryotic transcripts. In addition, a mobile-friendly web server is now freely available at http://cnit.noncode.org/CNIT.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Evaluation of the accuracy of CNIT, CNCI, CPC2, CPAT and PLEK software. Overall comparison data (A) and detailed accuracy (B) in the six organisms from the CPC2 website.
Figure 2.
Figure 2.
Global prediction by ROC analysis for CNIT across 37 species.
Figure 3.
Figure 3.
Screenshot of the CNIT web server. (A) Summary html view output with coding probability. (B, C) Graphical view of the ‘Details’ page.
Figure 4.
Figure 4.
Examples of CNIT analysis of transcripts for coding RNA L1CAM (A) and non-coding RNA HOTAIR (B). CNIT score distribution of the six reading frames for each transcript is the left y-axis and sequence length is normalized to nucleotide triplets in the x-axis. Red line represents the correct transcription reading frame and the other five lines (blue) represent the other five reading frames.

References

    1. Eddy S.R. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2001; 2:919–929. - PubMed
    1. Fu X.D. Non-coding RNA: a new frontier in regulatory biology. Natl. Sci. Rev. 2014; 1:190–204. - PMC - PubMed
    1. Fang S., Zhang L., Guo J., Niu Y., Wu Y., Li H., Zhao L., Li X., Teng X., Sun X. et al. .. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 2018; 46:D308–D314. - PMC - PubMed
    1. Wan Q., Guan X., Yang N., Wu H., Pan M., Liu B., Fang L., Yang S., Hu Y., Ye W. et al. .. Small interfering RNAs from bidirectional transcripts of GhMML3_A12 regulate cotton fiber development. New Phytol. 2016; 210:1298–1310. - PubMed
    1. Salmena L., Poliseno L., Tay Y., Kats L., Pandolfi P.P.. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language. Cell. 2011; 146:353–358. - PMC - PubMed

Publication types