Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
- PMID: 33045986
- PMCID: PMC7552357
- DOI: 10.1186/s12864-020-07114-8
Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
Erratum in
-
Correction to: Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences.BMC Genomics. 2020 Dec 18;21(1):895. doi: 10.1186/s12864-020-07231-4. BMC Genomics. 2020. PMID: 33339504 Free PMC article. No abstract available.
Abstract
Background: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity.
Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates-including 627 novel candidates-were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor.
Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.
Keywords: CLE; Euclidean distance; Evolution; Gene clustering; Gene prediction; Machine learning; Peptide hormone.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
References
-
- Clark SE, Running MP, Meyerowitz EM. CLAVATA1, a regulator of meristem and flower development in Arabidopsis. Development. 1993;119(2):397–418. - PubMed
MeSH terms
Substances
Grants and funding
- 2662018PY071/Fundamental Research Funds for the Central Universities
- 2662020YLPY026/Fundamental Research Funds for the Central Universities
- 31370673/National Natural Science Foundation of China
- 31770639/National Natural Science Foundation of China
- 2016YFD0600103/National Key Research and Development Program of China
LinkOut - more resources
Full Text Sources
Miscellaneous
