Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 4;22(1):412.
doi: 10.1186/s12864-020-07289-0.

Long non-coding RNA exploration for mesenchymal stem cell characterisation

Affiliations

Long non-coding RNA exploration for mesenchymal stem cell characterisation

Sébastien Riquier et al. BMC Genomics. .

Abstract

Background: The development of RNA sequencing (RNAseq) and the corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Therefore, we tested the biomarker potential of lncRNAs on Mesenchymal Stem Cells (MSCs), a complex type of adult multipotent stem cells of diverse tissue origins, that is frequently used in clinics but which is lacking extensive characterization.

Results: We developed a dedicated bioinformatics pipeline for the purpose of building a cell-specific catalogue of unannotated lncRNAs. The pipeline performs ab initio transcript identification, pseudoalignment and uses new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data. We next applied it on MSCs, and our pipeline was able to highlight novel lncRNAs with high cell specificity. Furthermore, with original and efficient approaches for functional prediction, we demonstrated that each candidate represents one specific state of MSCs biology.

Conclusions: We showed that our approach can be employed to harness lncRNAs as cell markers. More specifically, our results suggest different candidates as potential actors in MSCs biology and propose promising directions for future experimental investigations.

Keywords: Bioinformatics; Long non-coding RNA; Mesenchymal stem cell; NGS analysis; RNAseq; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart representation of the pipeline used in this study. The 4 steps of the flowchart are described. a Ab initio reconstruction of transcript expressed in MSCs from SRA dataset and creation of a reference (GTF+fasta) for quantification of Ensembl annotated genes, unannotated intergenic (Mlincs) and unannotated overlapping antisens (Mloanc). The results are shown in Fig. 2. b Differential Analysis for the selection of MSC markers (restrained candidates set) with Kallisto pseudoalignement and Sleuth differential test followed by feature selection by random forest with Boruta package. Long-read sequencing and active transcription in MSCs by epigenetic marks information completed the selection step (see Figs. 2 and 3). c Validation of cell expression specificity of the candidates by k-mer quantification in ENCODE RNAseq datasets (see Additional file 8 for the list of data) and qPCR validation. The results are presented in Fig. 4. d Functional investigations were performed with in silico prediction methods from the sequence of candidates, followed by k-mer quantification with FANTOM6 dataset, single-cell RNAseq and selected MSC conditions. K-mer quantification phases are shown by corresponding icons (Figs. 5 and 6)
Fig. 2
Fig. 2
Overview of annotated genes and unannotated transcripts enriched in BM-MSCs. a Left pannels represented: i/ Ensembl v90 transcript categories and distribution, ii/ transcripts distribution expressed in MSCs, showing unnatotated transcripts obtained with ab initio reconstruction by StringTie vs annotated transcripts (expression >0.1 TPM), iii/ predicted lncRNAs from unnanotated reconstructed transcripts include new lncRNAs with intergenic (Mlinc) and antisens (Mlncoa) RNA categories. b-c-d Distribution of transcript length, exon length and GC percentage across different categories respectively with the same colors as in a pannel: coding transcripts (blue), annotated lincRNA (pink), annotated overlapping antisens lncRNA (purple), novel lincRNA (Mlincs, yellow), novel overlapping antisens RNA (Mloanc, red). e Representation of annotated genes (top pannel) and unannotated transcripts (bottom pannel) overexpresed in MSC versus non-MSC types (log2FC >0.5 and padj <0.05), separately showed in MA plot. f Total number of transcripts by category. The colored bar indicated the number of differentially expressed annotated genes (Ensembl v90) and unannotated transcripts (Mlinc and Mloanc). gGlobal expression in BM-MSCs (with Sleuth normalisation) of the same categories as in f for annotated genes and unannotated transcripts
Fig. 3
Fig. 3
Selection of a refined set of the best candidates by random forest (top35), long-read sequencing and epigenetic features. a Expression of the best MSC-specific candidates selected by Boruta machine learning along MSC group and not MSC cohorts. Left pannel: top35 most relevant annotated genes (non-coding included); Right pannel: unannotated intergenic lncRNAs (Mlincs) and their average importance scores determined by Boruta method displayed in upside line plot. b Genomic visualisation of Mlincs 28428 (up left panel), 64225 (up right panel), 128022 (down left panel), and 89912 (down right panel). Predictions (Mlinc orange) from short reads alignment of all MSC group files (blue/magenta and BAM visualisation), are compared with unoriented long-read alignments (grey). Additional epigenomic features are shown to reveal active transcriptional activity from trimethylation of Histone H3 (H3K4me3), acetylation of Histone H3 H3K27 in MSCs (H3K4me3 and H3K27ac, green), and Dnase sensibitity hotspots of MSC (MSC DNAse, red)
Fig. 4
Fig. 4
High throughput exploration of selected candidates across a variety of samples by k-mer quantification in RNAseq and biological validation by RT-qPCR. a List of tissues for the cell specific expression exploration (samples with ID numbers are listed in Additional file 8) b Relative expression of Mlinc.28428.2, Mlinc.128022.2, and Mlinc.89912.1 across ENCODE’s ribodepleted RNAseq data, made by k-mer quantification, normalised by k-mer per million. c qPCR relative quantification was performed on the selected 3 Mlincs in MSC of different origins (BM-MSC, Ad-MSC, Umbilical cord msc) and other indicated cell types. Relative quantification (Log induction) was quantified by ddCt method using non MSC types as calibrator (mean of triplicates). Student tests have been made between triplicates, each test using BM-MSCs as reference group (ns: P >0.05, *: P ≤0.05, **: P ≤0.01, ***: P ≤0.001, ****: P ≤0.0001)
Fig. 5
Fig. 5
Prediction of potential functions of the candidates with k-mer quantification and single-cell. For each Mlinc (Mlinc.28428 (a) Mlinc.128022 (b) and Mlinc.89912 (c) respectively) 3 steps of prediction were performed. a Enrichment in the different subcompartments of fibroblasts from FANTOM6 dataset: free nuclear fraction (Nuc), chromatin (Chr) and cytoplasm (Cyt); b Expression of markers in FANTOM6 data depending of the Knock-down (KD) of annotated lncRNAs. Normalised counts of all specific k-mers is averaged by sample (zero values deleted) and t-tests are made between control (in pink) and KD fibroblasts (in turquoise). c General expression of Mlincs inside Ad-MSC population, dimensional reduction made with UMAP method, made from batch corrected counts. Expression of differentially expressed annotated genes between positive (in turquoise) and negative (in pink) cells for Mlinc.28428, Mlinc.128022 and Mlinc.89912 respectively
Fig. 6
Fig. 6
Expression of markers in different datasets from SRA in cell conditions related to previous findings. a Expression of Mlinc.28428.1 in the context of oxydative, replicative, or KO-driven, stress and senescence (PRJNA396193, PRJNA433339). Relevant changes of expression are showed with t-test results (ns: P >0.05, *: P ≤0.05, **: P ≤0.01, ***: P ≤0.001, ****: P ≤0.0001). b Expression of Mlinc.128022 in osteodifferentiation conditions (PRJNA515466) or osteodifferentiation potential (PRJNA379707). Relevant changes of expressions are showed with t-test results (ns: P >0.05, *: P ≤0.05, **: P ≤0.01, ***: P ≤0.001, ****: P ≤0.0001). c Expression of Mlinc.89912 in the context of proliferation (PRJNA328824 and PRJNA498109). Relevant changes of expression are showed with t-test results (ns: P >0.05, *: P ≤0.05s, **: P ≤0.01, ***: P ≤0.001, ****: P ≤0.0001). The detailed list of datasets is provided in Additional file 16

References

    1. Gloss BS, Dinger ME. The specificity of long noncoding RNA expression. Biochim Biophys Acta Gene Regul Mech. 2016;1859(1):16–22. doi: 10.1016/j.bbagrm.2015.08.005. - DOI - PubMed
    1. Meseure D, Drak Alsibai K, Nicolas A, Bieche I, Morillon A. Long noncoding RNAs as new architects in cancer epigenetics, prognostic biomarkers, and potential therapeutic targets. BioMed Res Int. 2015;2015:e320214. doi: 10.1155/2015/320214. - DOI - PMC - PubMed
    1. Bouckenheimer J, Assou S, Riquier S, Hou C, Philippe N, Sansac C, Lavabre-Bertrand T, Commes T, Lemaître J-M, Boureux A, Vos JD. Long non-coding RNAs in human early embryonic development and their potential in ART. Hum Reprod Update. 2016;23:19–40. doi: 10.1093/humupd/dmw035. - DOI - PubMed
    1. Li L, Chang HY. Physiological roles of long noncoding RNAs: Insights from knockout mice. Trends Cell Biol. 2014;24(10):594–602. doi: 10.1016/j.tcb.2014.06.003. - DOI - PMC - PubMed
    1. Dhamija S, Diederichs S. From junk to master regulators of invasion: lncRNA functions in migration, EMT and metastasis. Int J Cancer. 2016;139(2):269–80. doi: 10.1002/ijc.30039. - DOI - PubMed

Substances

LinkOut - more resources