Positional correlation analysis improves reconstruction of full-length transcripts and alternative isoforms from noisy array signals or short reads
- PMID: 22332235
- PMCID: PMC3315713
- DOI: 10.1093/bioinformatics/bts065
Positional correlation analysis improves reconstruction of full-length transcripts and alternative isoforms from noisy array signals or short reads
Abstract
Motivation: A reconstruction of full-length transcripts observed by next-generation sequencer or tiling arrays is an essential technique to know all phenomena of transcriptomes. Several techniques of the reconstruction have been developed. However, problems of high-level noises and biases still remain and interrupt the reconstruction. A method is required that is robust against noise and bias and correctly reconstructs transcripts regardless of equipment used.
Results: We propose a completely new statistical method that reconstructs full-length transcripts and can be applied on both next-generation sequencers and tiling arrays. The method called ARTADE2 analyzes 'positional correlation', meaning correlations of expression values for every combination on genomic positions of multiple transcriptional data. ARTADE2 then reconstructs full-length transcripts using a logistic model based on the positional correlation and the Markov model. ARTADE2 elucidated 17 591 full-length transcripts from 55 transcriptome datasets and showed notable performance compared with other recent prediction methods. Moreover, 1489 novel transcripts were discovered. We experimentally tested 16 novel transcripts, among which 14 were confirmed by reverse transcription-polymerase chain reaction and sequence mapping. The method also showed notable performance for reconstructing of mRNA observed by a next-generation sequencer. Moreover, the positional correlation and factor analysis embedded in ARTADE2 successfully detected regions at which alternative isoforms may exist, and thus are expected to be applied for discovering transcript biomarkers for a wide range of disciplines including preemptive medicine.
Availability: http://matome.base.riken.jp
Contact: toyoda@base.riken.jp
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures










Similar articles
-
Knowledge-based reconstruction of mRNA transcripts with short sequencing reads for transcriptome research.PLoS One. 2012;7(2):e31440. doi: 10.1371/journal.pone.0031440. Epub 2012 Feb 1. PLoS One. 2012. PMID: 22312447 Free PMC article.
-
ARTADE2DB: improved statistical inferences for Arabidopsis gene functions and structure predictions by dynamic structure-based dynamic expression (DSDE) analyses.Plant Cell Physiol. 2011 Feb;52(2):254-64. doi: 10.1093/pcp/pcq202. Epub 2011 Jan 12. Plant Cell Physiol. 2011. PMID: 21227933 Free PMC article.
-
Next-generation tag sequencing for cancer gene expression profiling.Genome Res. 2009 Oct;19(10):1825-35. doi: 10.1101/gr.094482.109. Epub 2009 Jun 18. Genome Res. 2009. PMID: 19541910 Free PMC article.
-
Next-generation transcriptome assembly.Nat Rev Genet. 2011 Sep 7;12(10):671-82. doi: 10.1038/nrg3068. Nat Rev Genet. 2011. PMID: 21897427 Review.
-
Sequence assembly using next generation sequencing data--challenges and solutions.Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review.
Cited by
-
Small open reading frames associated with morphogenesis are hidden in plant genomes.Proc Natl Acad Sci U S A. 2013 Feb 5;110(6):2395-400. doi: 10.1073/pnas.1213958110. Epub 2013 Jan 22. Proc Natl Acad Sci U S A. 2013. PMID: 23341627 Free PMC article.
-
Novel Stress-Inducible Antisense RNAs of Protein-Coding Loci Are Synthesized by RNA-Dependent RNA Polymerase.Plant Physiol. 2017 Sep;175(1):457-472. doi: 10.1104/pp.17.00787. Epub 2017 Jul 14. Plant Physiol. 2017. PMID: 28710133 Free PMC article.
-
A Stress-Activated Transposon in Arabidopsis Induces Transgenerational Abscisic Acid Insensitivity.Sci Rep. 2016 Mar 15;6:23181. doi: 10.1038/srep23181. Sci Rep. 2016. PMID: 26976262 Free PMC article.
References
-
- Baerenfaller K., et al. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science. 2008;320:938–941. - PubMed
-
- German M.A., et al. Construction of Parallel Analysis of RNA Ends (PARE) libraries for the study of cleaved miRNA targets and the RNA degradome. Nat. Protoc. 2009;4:356–362. - PubMed
-
- Hendrickson A.E., White P.O. PROMAX : a quick method for rotation to oblique simple structure. Br. J. Stat. Psychol. 1964;17:65–70.