Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 11:8:58.
doi: 10.1186/1741-7007-8-58.

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Affiliations

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Daniela Witten et al. BMC Biol. .

Abstract

Background: Ultra-high throughput sequencing technologies provide opportunities both for discovery of novel molecular species and for detailed comparisons of gene expression patterns. Small RNA populations are particularly well suited to this analysis, as many different small RNAs can be completely sequenced in a single instrument run.

Results: We prepared small RNA libraries from 29 tumour/normal pairs of human cervical tissue samples. Analysis of the resulting sequences (42 million in total) defined 64 new human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in twenty-three of the newly identified miRNA candidates. We tested several computational approaches for the analysis of class differences between high throughput sequencing datasets and describe a novel application of a log linear model that has provided the most effective analysis for this data. This method resulted in the identification of 67 miRNAs that were differentially-expressed between the tumour and normal samples at a false discovery rate less than 0.001.

Conclusions: This approach can potentially be applied to any kind of RNA sequencing data for analysing differential sequence representation between biological sample sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distinctive patterns of miRNA expression between cervical cancer and normal samples revealed by principal component analysis. microRNA incidence values from each sample were projected onto the first two principal components, using cube-rooted data. This two-dimensional representation of the ~714 dimensional primary data resulted in evident separation between normal and tumour samples but not between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) samples. The first principal component explains 21.2% of the variation present in the data and the second explains 11.6%. ASC, adenosquamous cell carcinoma; T, tumour; N, normal.
Figure 2
Figure 2
Clustering analyses of normal and tumour samples based on microRNA expression. (a) Samples were clustered using cube-rooted data and correlation-based distance (as described in Additional File 9). Two large subgroups and one small outgroup resulted, with separations 1N:1T, 29N:1T and 0N:28T, respectively. The small outgroup consisted of tumour and normal samples from patient G428. The remaining samples were partitioned among the two larger subgroups, one of which consisted of the other 29 normal samples and one tumour sample, and the other consisted of the remaining 28 tumour samples. (b) Samples were clustered using the distance metric defined in Section 4.1 of Berninger et al. [37]. Again, an outgroup and two major subgroups resulted, with separations 0N:2T, 25N:2T and 5N:26T, respectively. For both panels, 'N' indicates a normal sample and 'T' indicates a tumour sample. Note: the duplicates of G699N and G761T are clustered near each other in both methods.
Figure 3
Figure 3
Comparison of false discovery rate estimates based on different statistical methods. False discovery rates are shown for our proposed method for identification of significant microRNAs (miRNAs) - a log-linear model on cube-rooted data - as well as three competing methods: a log-linear model on raw data, t-statistics on raw data, and t-statistics on cube-rooted data. The log-linear model on cube-rooted data results in extremely low false discovery rates (FDRs). The FDR for a given miRNA score cutoff is the average proportion of miRNAs with scores above that cutoff that are 'false positives'; see Additional File 9 for details on FDR calculation.

Similar articles

Cited by

References

    1. Hamilton AJ, Baulcombe DC. A species of small antisense RNA in posttranscriptional gene silencing in plants. Science. 1999;286:950–952. doi: 10.1126/science.286.5441.950. - DOI - PubMed
    1. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. - DOI - PubMed
    1. Birchler JA, Kavi HH. Molecular biology. Slicing and dicing for small RNAs. Science. 2008;320:1023–1024. doi: 10.1126/science.1159018. - DOI - PubMed
    1. Chapman EJ, Carrington JC. Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet. 2007;8:884–896. doi: 10.1038/nrg2179. - DOI - PubMed
    1. Stefani G, Slack FJ. Small non-coding RNAs in animal development. Nat Rev Mol Cell Biol. 2008;9:219–230. doi: 10.1038/nrm2347. - DOI - PubMed

Publication types