. 2019 Nov 18;47(20):e127.

doi: 10.1093/nar/gkz740.

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Bin Liu^{1

2}, Xin Gao³, Hanyu Zhang³

Affiliations

¹ School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
² Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.
³ School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China.

PMID: 31504851
PMCID: PMC6847461
DOI: 10.1093/nar/gkz740

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Bin Liu et al. Nucleic Acids Res. 2019.

. 2019 Nov 18;47(20):e127.

doi: 10.1093/nar/gkz740.

Authors

Bin Liu^{1

2}, Xin Gao³, Hanyu Zhang³

Affiliations

¹ School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
² Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.
³ School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China.

PMID: 31504851
PMCID: PMC6847461
DOI: 10.1093/nar/gkz740

Abstract

As the first web server to analyze various biological sequences at sequence level based on machine learning approaches, many powerful predictors in the field of computational biology have been developed with the assistance of the BioSeq-Analysis. However, the BioSeq-Analysis can be only applied to the sequence-level analysis tasks, preventing its applications to the residue-level analysis tasks, and an intelligent tool that is able to automatically generate various predictors for biological sequence analysis at both residue level and sequence level is highly desired. In this regard, we decided to publish an important updated server covering a total of 26 features at the residue level and 90 features at the sequence level called BioSeq-Analysis2.0 (http://bliulab.net/BioSeq-Analysis2.0/), by which the users only need to upload the benchmark dataset, and the BioSeq-Analysis2.0 can generate the predictors for both residue-level analysis and sequence-level analysis tasks. Furthermore, the corresponding stand-alone tool was also provided, which can be downloaded from http://bliulab.net/BioSeq-Analysis2.0/download/. To the best of our knowledge, the BioSeq-Analysis2.0 is the first tool for generating predictors for biological sequence analysis tasks at residue level. Specifically, the experimental results indicated that the predictors developed by BioSeq-Analysis2.0 can achieve comparable or even better performance than the existing state-of-the-art predictors.

PubMed Disclaimer

Figures

**Figure 1.**
The three main processes of biological sequence analysis tasks at residue level (top part) and sequence level (bottom part) based on machine learning algorithms. The residue-level analysis tasks explore the characteristics of residues, while the sequence-level analysis tasks explore the characteristics of the entire sequences.

**Figure 2.**
The relationship between sequence labelling algorithm and classification algorithm. Compared with the classification algorithm, the sequence labelling algorithm is able to consider the interactions among residues along the sequence in a global fashion.

**Figure 3.**
The pipeline of the web server of BioSeq-Analysis 2.0.

**Figure 4.**
A screenshot to show that BioSeq-Analysis2.0 contains three sub servers, including (i) DNA-Analysis2.0, (ii) RNA-Analysis2.0, (iii) Protein-Analysis2.0 for residue-level analysis (A) and sequence-level analysis (B). For each of the three sub-servers, users can generate their desired predictors via the buttons marked with (iv), (v) and (vi).

**Figure 5.**
A screenshot to show the result page of DNA-Analysis2.0. It contains six panels: (A) the parameter summary, including the input sequence type, selected feature, and the selected machine learning algorithm; (B) the 5-fold cross-validation results, including Acc, MCC, AUC, Sn, and Sp; (C) the generated ROC curve; (D) the trained model with parameters; (E) features in Scikit-learn format; (F) features in Weka format.

**Figure 6.**
An illustration of the ROC curves and the values of AUC of 14 different predictors for the identification of enhancers generated by DNA-Analysis2.0 on the benchmark dataset (7,8) based on SVM (A) and RF (B).

**Figure 7.**
An illustration of the ROC curves and the values of AUC of 12 different predictors for mRNAs (m⁶A) site identification generated by RNA-Analysis2.0 on the subset of the benchmark dataset (6) based on SVM (A) and RF (B).

**Figure 8.**
An illustration of the ROC curves and the values of AUC of 26 different predictors for disordered protein identification generated by Protein-Analysis2.0 on the subset of the benchmark dataset (66) based on CRF (A) and SVM (B).

See this image and copyright information in PMC

Cited by

PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites.
Zuo Y, Fang X, Wan J, He W, Liu X, Zeng X, Deng Z. Zuo Y, et al. PLoS Comput Biol. 2024 Oct 22;20(10):e1012544. doi: 10.1371/journal.pcbi.1012544. eCollection 2024 Oct. PLoS Comput Biol. 2024. PMID: 39436947 Free PMC article.
Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences.
Li X, Li H, Yang Z, Wang L. Li X, et al. BMC Genomics. 2024 Sep 12;25(1):855. doi: 10.1186/s12864-024-10786-1. BMC Genomics. 2024. PMID: 39266973 Free PMC article.
Advances in the Identification of Circular RNAs and Research Into circRNAs in Human Diseases.
Jiao S, Wu S, Huang S, Liu M, Gao B. Jiao S, et al. Front Genet. 2021 Mar 19;12:665233. doi: 10.3389/fgene.2021.665233. eCollection 2021. Front Genet. 2021. PMID: 33815488 Free PMC article. Review.
ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.
Ghafoor H, Asim MN, Ibrahim MA, Dengel A. Ghafoor H, et al. Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15. Heliyon. 2024. PMID: 39281576 Free PMC article.
Detecting Interactive Gene Groups for Single-Cell RNA-Seq Data Based on Co-Expression Network Analysis and Subgraph Learning.
Ye X, Zhang W, Futamura Y, Sakurai T. Ye X, et al. Cells. 2020 Aug 21;9(9):1938. doi: 10.3390/cells9091938. Cells. 2020. PMID: 32825786 Free PMC article.

See all "Cited by" articles

References

1. Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief. Bioinform. 2017; doi:10.1093/bib/bbx165. - PubMed
1. Chen Z., Zhao P., Li F., Marquez-Lago T.T., Leier A., Revote J., Zhu Y., Powell D.R., Akutsu T., Webb G.I. et al. .. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform. 2019; doi:10.1093/bib/bbz041. - PubMed
1. Wei L., Hu J., Li F., Song J., Su R., Zou Q.. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief. Bioinform. 2018; doi:10.1093/bib/bby107. - PubMed
1. Bock J.R., Gough D.A.. Predicting protein–protein interactions from primary structure. Bioinformatics. 2001; 17:455–460. - PubMed
1. Ishida T., Kinoshita K.. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007; 35:W460–W464. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Affiliations

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials