Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 1;21(19):7271.
doi: 10.3390/ijms21197271.

LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion

Affiliations

LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion

Shiyao Feng et al. Int J Mol Sci. .

Abstract

Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.

Keywords: logarithm-distance of Hexamer; multi-source features; subcellullar location; the binomial distribution-based filtering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
New fea.Tuple training results on each model.
Figure 2
Figure 2
New fea.Bio training results on each model.
Figure 3
Figure 3
Connecting new fea.Tuple and new fea.Bio training results on each model.
Figure 4
Figure 4
Input and output part of the screenshot of the lncLocation web server.
Figure 5
Figure 5
Pie chart of the distribution ratio of lncRNA in four organelles.
Figure 6
Figure 6
The flowchart of lncLocation. (A) Multi-source feature extraction; (B) Feature learning and model selection.

Similar articles

Cited by

References

    1. Zou C., Wang J., Huang X., Jian C., Zou D., Li X. Analysis of transcription factor- and ncRNA-mediated potential pathogenic gene modules in Alzheimer’s disease. Aging. 2019;11:6109–6119. doi: 10.18632/aging.102169. - DOI - PMC - PubMed
    1. Zhdanov V.P. Kinetic models of the interference of gene transcription to ncRNA and mRNA. Chaos. 2011;21:023135. doi: 10.1063/1.3605464. - DOI - PubMed
    1. Fitzpatrick C., Bendek M.F., Briones M., Farfan N., Silva V.A., Nardocci G., Montecino M., Boland A., Deleuze J.F., Villegas J., et al. Mitochondrial ncRNA targeting induces cell cycle arrest and tumor growth inhibition of MDA-MB-231 breast cancer cells through reduction of key cell cycle progression factors. Cell Death Dis. 2019;10:423. doi: 10.1038/s41419-019-1649-3. - DOI - PMC - PubMed
    1. Hou A., Zhang Y., Zheng Y., Fan Y., Liu H., Zhou X. LncRNA terminal differentiation-induced ncRNA (TINCR) sponges miR-302 to upregulate cyclin D1 in cervical squamous cell carcinoma (CSCC) Hum. Cell. 2019;32:515–521. doi: 10.1007/s13577-019-00268-y. - DOI - PubMed
    1. Yuan Q., Guo X., Ren Y., Wen X., Gao L. Cluster correlation based method for lncRNA-disease association prediction. BMC Bioinform. 2020;21:180. doi: 10.1186/s12859-020-3496-8. - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources