Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 18;47(20):e127.
doi: 10.1093/nar/gkz740.

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Affiliations

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches

Bin Liu et al. Nucleic Acids Res. .

Abstract

As the first web server to analyze various biological sequences at sequence level based on machine learning approaches, many powerful predictors in the field of computational biology have been developed with the assistance of the BioSeq-Analysis. However, the BioSeq-Analysis can be only applied to the sequence-level analysis tasks, preventing its applications to the residue-level analysis tasks, and an intelligent tool that is able to automatically generate various predictors for biological sequence analysis at both residue level and sequence level is highly desired. In this regard, we decided to publish an important updated server covering a total of 26 features at the residue level and 90 features at the sequence level called BioSeq-Analysis2.0 (http://bliulab.net/BioSeq-Analysis2.0/), by which the users only need to upload the benchmark dataset, and the BioSeq-Analysis2.0 can generate the predictors for both residue-level analysis and sequence-level analysis tasks. Furthermore, the corresponding stand-alone tool was also provided, which can be downloaded from http://bliulab.net/BioSeq-Analysis2.0/download/. To the best of our knowledge, the BioSeq-Analysis2.0 is the first tool for generating predictors for biological sequence analysis tasks at residue level. Specifically, the experimental results indicated that the predictors developed by BioSeq-Analysis2.0 can achieve comparable or even better performance than the existing state-of-the-art predictors.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The three main processes of biological sequence analysis tasks at residue level (top part) and sequence level (bottom part) based on machine learning algorithms. The residue-level analysis tasks explore the characteristics of residues, while the sequence-level analysis tasks explore the characteristics of the entire sequences.
Figure 2.
Figure 2.
The relationship between sequence labelling algorithm and classification algorithm. Compared with the classification algorithm, the sequence labelling algorithm is able to consider the interactions among residues along the sequence in a global fashion.
Figure 3.
Figure 3.
The pipeline of the web server of BioSeq-Analysis 2.0.
Figure 4.
Figure 4.
A screenshot to show that BioSeq-Analysis2.0 contains three sub servers, including (i) DNA-Analysis2.0, (ii) RNA-Analysis2.0, (iii) Protein-Analysis2.0 for residue-level analysis (A) and sequence-level analysis (B). For each of the three sub-servers, users can generate their desired predictors via the buttons marked with (iv), (v) and (vi).
Figure 5.
Figure 5.
A screenshot to show the result page of DNA-Analysis2.0. It contains six panels: (A) the parameter summary, including the input sequence type, selected feature, and the selected machine learning algorithm; (B) the 5-fold cross-validation results, including Acc, MCC, AUC, Sn, and Sp; (C) the generated ROC curve; (D) the trained model with parameters; (E) features in Scikit-learn format; (F) features in Weka format.
Figure 6.
Figure 6.
An illustration of the ROC curves and the values of AUC of 14 different predictors for the identification of enhancers generated by DNA-Analysis2.0 on the benchmark dataset (7,8) based on SVM (A) and RF (B).
Figure 7.
Figure 7.
An illustration of the ROC curves and the values of AUC of 12 different predictors for mRNAs (m6A) site identification generated by RNA-Analysis2.0 on the subset of the benchmark dataset (6) based on SVM (A) and RF (B).
Figure 8.
Figure 8.
An illustration of the ROC curves and the values of AUC of 26 different predictors for disordered protein identification generated by Protein-Analysis2.0 on the subset of the benchmark dataset (66) based on CRF (A) and SVM (B).

Similar articles

Cited by

References

    1. Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief. Bioinform. 2017; doi:10.1093/bib/bbx165. - PubMed
    1. Chen Z., Zhao P., Li F., Marquez-Lago T.T., Leier A., Revote J., Zhu Y., Powell D.R., Akutsu T., Webb G.I. et al. .. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform. 2019; doi:10.1093/bib/bbz041. - PubMed
    1. Wei L., Hu J., Li F., Song J., Su R., Zou Q.. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief. Bioinform. 2018; doi:10.1093/bib/bby107. - PubMed
    1. Bock J.R., Gough D.A.. Predicting protein–protein interactions from primary structure. Bioinformatics. 2001; 17:455–460. - PubMed
    1. Ishida T., Kinoshita K.. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007; 35:W460–W464. - PMC - PubMed

Publication types