Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 30;12(1):46.
doi: 10.3390/genes12010046.

DIANA-mAP: Analyzing miRNA from Raw NGS Data to Quantification

Affiliations

DIANA-mAP: Analyzing miRNA from Raw NGS Data to Quantification

Athanasios Alexiou et al. Genes (Basel). .

Abstract

microRNAs (miRNAs) are small non-coding RNAs (~22 nts) that are considered central post-transcriptional regulators of gene expression and key components in many pathological conditions. Next-Generation Sequencing (NGS) technologies have led to inexpensive, massive data production, revolutionizing every research aspect in the fields of biology and medicine. Particularly, small RNA-Seq (sRNA-Seq) enables small non-coding RNA quantification on a high-throughput scale, providing a closer look into the expression profiles of these crucial regulators within the cell. Here, we present DIANA-microRNA-Analysis-Pipeline (DIANA-mAP), a fully automated computational pipeline that allows the user to perform miRNA NGS data analysis from raw sRNA-Seq libraries to quantification and Differential Expression Analysis in an easy, scalable, efficient, and intuitive way. Emphasis has been given to data pre-processing, an early, critical step in the analysis for the robustness of the final results and conclusions. Through modularity, parallelizability and customization, DIANA-mAP produces high quality expression results, reports and graphs for downstream data mining and statistical analysis. In an extended evaluation, the tool outperforms similar tools providing pre-processing without any adapter knowledge. Closing, DIANA-mAP is a freely available tool. It is available dockerized with no dependency installations or standalone, accompanied by an installation manual through Github.

Keywords: NGS; analysis; bioinformatics; expression; microRNA; pipeline; quantification; small RNA-Seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The DIANA-microRNA-Analysis-Pipeline (DIANA-mAP) analysis workflow. The users are able to download or provide their own datasets. If the adapters are not known DIANA-mAP utilizes DNApi to infer them and Cutadapt to remove them. The preprocessed (mappable) reads are aligned to the specified reference genome and then to the known miRNAs from miRBase to provide quantification results. If requested, a Differential Expression (DE) analysis is also performed between the datasets analyzed.
Figure 2
Figure 2
DIANA-mAP preprocessing workflow. It is composed of three individual steps: In the Data Acquisition step, the user can download publicly available datasets from online repositories by providing their accession numbers. The Adapter Detection step either uses a provided adapter sequence or scans the dataset in order to infer the adapter sequence and identify it. The Quality Trimming/Adapter Removal step removes from the dataset low-quality sections and full or partial adapter sequences in order to cleanse the dataset for further analysis.
Figure 3
Figure 3
DIANA-mAP visualization results. (A) Raw reads length distribution (SRR033716). (B) Pre-processed (mappable) reads length distribution (SRR033716). (C) Pie-Chart showing the fractions of filtered and mappable reads after the pre-processing step of the analysis (SRR033716). Mappable reads (Cleansed) are reads that were cleansed of partial adapter sequences through pre-processing loops (see Section 2.2.1). “Without_Adapter” are reads in which no adapter was found, while “Too_Short” are reads that had very low number of base pairs (based on configuration) after adapter trimming and were consequently excluded from further analysis. (D) Differential Expression Analysis PCA graph for a group of 6 analyzed samples of a miRNA expression study on breast tumors. The three orange-colored samples (SRR191585, SRR191608 and SRR191609) originate from a breast cell line while the three teal-colored ones (SRR191402, SRR191410 and SRR191551) originate from invasive ductal carcinoma (IDC) tissues.
Figure 4
Figure 4
Scatter plot of the quantified miRNA raw counts (Log2-transformed) produced by DIANA-mAP and miARma-Seq tools by analyzing: Dataset_Group_1: 8 publicly available datasets analyzed in the publication of miARma-Seq, also offered as example datasets alongside the tool; Dataset_Group_2: 24 publicly available datasets acquired from SRA and analyzed as examples for this study. Each marker represents the number of quantified miRNA raw counts produced by the two tools for a sample. Markers on top of the red line indicate equal numbers of quantified reads between the tools for that sample. Markers skewing toward a particular side indicates a higher number produced for that side.
Figure 5
Figure 5
Bar plot of the raw quantified miRNA results for DIANA-mAP and miARma-Seq on an artificial sRNA-Seq dataset. The “Simulated Reads” bar indicates the absolute number of simulated miRNA reads present in the dataset.
Figure 6
Figure 6
Line graph depicting the analysis run times of the two programs against the datasets’ total number of reads for the 24 libraries in Dataset_Group_2. All the analyses were run using one core of a High-Performance Computer (HPC) with 48 2.3 GHz cores and 256 GB of RAM under the CentOS operating system.

References

    1. ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. - DOI - PubMed
    1. Trapnell C., Roberts A., Goff L., Pertea G., Kim D., Kelley D.R., Pimentel H., Salzberg S.L., Rinn J.L., Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. - DOI - PMC - PubMed
    1. Pertea M., Kim D., Pertea G.M., Leek J.T., Salzberg S.L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 2016;11:1650–1667. doi: 10.1038/nprot.2016.095. - DOI - PMC - PubMed
    1. Vlachos I., Hatzigeorgiou A.G. Online resources for miRNA analysis. Clin. Biochem. 2013;46:879–900. doi: 10.1016/j.clinbiochem.2013.03.006. - DOI - PubMed
    1. Lujambio A., Lowe S.W. The microcosmos of cancer. Nature. 2012;482:7385. doi: 10.1038/nature10888. - DOI - PMC - PubMed

Publication types

LinkOut - more resources