. 2010 May 11:8:58.

doi: 10.1186/1741-7007-8-58.

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Daniela Witten¹, Robert Tibshirani, Sam Guoping Gu, Andrew Fire, Weng-Onn Lui

Affiliations

PMID: 20459774
PMCID: PMC2880020
DOI: 10.1186/1741-7007-8-58

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Daniela Witten et al. BMC Biol. 2010.

. 2010 May 11:8:58.

doi: 10.1186/1741-7007-8-58.

Authors

Daniela Witten¹, Robert Tibshirani, Sam Guoping Gu, Andrew Fire, Weng-Onn Lui

Affiliation

¹ Department of Statistics, Stanford University, Stanford, California 94305-4065, USA.

PMID: 20459774
PMCID: PMC2880020
DOI: 10.1186/1741-7007-8-58

Abstract

Background: Ultra-high throughput sequencing technologies provide opportunities both for discovery of novel molecular species and for detailed comparisons of gene expression patterns. Small RNA populations are particularly well suited to this analysis, as many different small RNAs can be completely sequenced in a single instrument run.

Results: We prepared small RNA libraries from 29 tumour/normal pairs of human cervical tissue samples. Analysis of the resulting sequences (42 million in total) defined 64 new human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in twenty-three of the newly identified miRNA candidates. We tested several computational approaches for the analysis of class differences between high throughput sequencing datasets and describe a novel application of a log linear model that has provided the most effective analysis for this data. This method resulted in the identification of 67 miRNAs that were differentially-expressed between the tumour and normal samples at a false discovery rate less than 0.001.

Conclusions: This approach can potentially be applied to any kind of RNA sequencing data for analysing differential sequence representation between biological sample sets.

PubMed Disclaimer

Figures

**Figure 1**
**Distinctive patterns of miRNA expression between cervical cancer and normal samples revealed by principal component analysis**. microRNA incidence values from each sample were projected onto the first two principal components, using cube-rooted data. This two-dimensional representation of the ~714 dimensional primary data resulted in evident separation between normal and tumour samples but not between adenocarcinoma (ADC) and squamous cell carcinoma (SCC) samples. The first principal component explains 21.2% of the variation present in the data and the second explains 11.6%. ASC, adenosquamous cell carcinoma; T, tumour; N, normal.

**Figure 2**
**Clustering analyses of normal and tumour samples based on microRNA expression. (a) Samples were clustered using cube-rooted data and correlation-based distance (as described in** Additional File 9). Two large subgroups and one small outgroup resulted, with separations 1N:1T, 29N:1T and 0N:28T, respectively. The small outgroup consisted of tumour and normal samples from patient G428. The remaining samples were partitioned among the two larger subgroups, one of which consisted of the other 29 normal samples and one tumour sample, and the other consisted of the remaining 28 tumour samples. (b) Samples were clustered using the distance metric defined in Section 4.1 of Berninger *et al*. [37]. Again, an outgroup and two major subgroups resulted, with separations 0N:2T, 25N:2T and 5N:26T, respectively. For both panels, 'N' indicates a normal sample and 'T' indicates a tumour sample. Note: the duplicates of G699N and G761T are clustered near each other in both methods.

**Figure 3**
**Comparison of false discovery rate estimates based on different statistical methods**. False discovery rates are shown for our proposed method for identification of significant microRNAs (miRNAs) - a log-linear model on cube-rooted data - as well as three competing methods: a log-linear model on raw data, t-statistics on raw data, and t-statistics on cube-rooted data. The log-linear model on cube-rooted data results in extremely low false discovery rates (FDRs). The FDR for a given miRNA score cutoff is the average proportion of miRNAs with scores above that cutoff that are 'false positives'; see Additional File 9 for details on FDR calculation.

See this image and copyright information in PMC

Cited by

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.
Dong K, Zhao H, Tong T, Wan X. Dong K, et al. BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1. BMC Bioinformatics. 2016. PMID: 27623864 Free PMC article.
Fine-tuning of microRNA-mediated repression of mRNA by splicing-regulated and highly repressive microRNA recognition element.
Wu CT, Chiou CY, Chiu HC, Yang UC. Wu CT, et al. BMC Genomics. 2013 Jul 3;14:438. doi: 10.1186/1471-2164-14-438. BMC Genomics. 2013. PMID: 23819653 Free PMC article.
The analytical landscape of static and temporal dynamics in transcriptome data.
Oh S, Song S, Dasgupta N, Grabowski G. Oh S, et al. Front Genet. 2014 Feb 20;5:35. doi: 10.3389/fgene.2014.00035. eCollection 2014. Front Genet. 2014. PMID: 24600473 Free PMC article. Review.
Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis.
Akshay A, Katoch M, Shekarchizadeh N, Abedi M, Sharma A, Burkhard FC, Adam RM, Monastyrskaya K, Gheinani AH. Akshay A, et al. Gigascience. 2024 Jan 2;13:giad111. doi: 10.1093/gigascience/giad111. Gigascience. 2024. PMID: 38206587 Free PMC article.
Novel tumor suppressor microRNA at frequently deleted chromosomal region 8p21 regulates epidermal growth factor receptor in prostate cancer.
Bucay N, Sekhon K, Majid S, Yamamura S, Shahryari V, Tabatabai ZL, Greene K, Tanaka Y, Dahiya R, Deng G, Saini S. Bucay N, et al. Oncotarget. 2016 Oct 25;7(43):70388-70403. doi: 10.18632/oncotarget.11865. Oncotarget. 2016. PMID: 27611943 Free PMC article.

See all "Cited by" articles

References

1. Hamilton AJ, Baulcombe DC. A species of small antisense RNA in posttranscriptional gene silencing in plants. Science. 1999;286:950–952. doi: 10.1126/science.286.5441.950. - DOI - PubMed
1. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. - DOI - PubMed
1. Birchler JA, Kavi HH. Molecular biology. Slicing and dicing for small RNAs. Science. 2008;320:1023–1024. doi: 10.1126/science.1159018. - DOI - PubMed
1. Chapman EJ, Carrington JC. Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet. 2007;8:884–896. doi: 10.1038/nrg2179. - DOI - PubMed
1. Stefani G, Slack FJ. Small non-coding RNAs in animal development. Nat Rev Mol Cell Biol. 2008;9:219–230. doi: 10.1038/nrm2347. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

N01-HV-28183/HV/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Affiliation

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases