Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 23;23(1):534.
doi: 10.1186/s12864-022-08735-x.

The flax genome reveals orbitide diversity

Affiliations

The flax genome reveals orbitide diversity

Ziliang Song et al. BMC Genomics. .

Abstract

Background: Ribosomally-synthesized cyclic peptides are widely found in plants and exhibit useful bioactivities for humans. The identification of cyclic peptide sequences and their precursor proteins is facilitated by the growing number of sequenced genomes. While previous research largely focused on the chemical diversity of these peptides across various species, there is little attention to a broader range of potential peptides that are not chemically identified.

Results: A pioneering study was initiated to explore the genetic diversity of linusorbs, a group of cyclic peptides uniquely occurring in cultivated flax (Linum usitatissimum). Phylogenetic analysis clustered the 5 known linusorb precursor proteins into two clades and one singleton. Preliminary tBLASTn search of the published flax genome using the whole protein sequence as query could only retrieve its homologues within the same clade. This limitation was overcome using a profile-based mining strategy. After genome reannotation, a hidden Markov Model (HMM)-based approach identified 58 repeats homologous to the linusorb-embedded repeats in 8 novel proteins, implying that they share common ancestry with the linusorb-embedded repeats. Subsequently, we developed a customized profile composed of a random linusorb-like domain (LLD) flanked by 5 conserved sites and used it for string search of the proteome, which extracted 281 LLD-containing repeats (LLDRs) in 25 proteins. Comparative analysis of different repeat categories suggested that the 5 conserved flanking sites among the non-homologous repeats have undergone convergent evolution driven by functional selection.

Conclusions: The profile-based mining approach is suitable for analyzing repetitive sequences. The 25 LLDR proteins identified herein represent the potential diversity of cyclic peptides within the flax genome and lay a foundation for further studies on the functions and evolution of these protein tandem repeats.

Keywords: Diversity; Linum usitatissimum; Mining; Orbitide; Protein tandem repeats.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
General biosynthetic pathway of orbitides, abstracted from [3]. The precursor protein mainly comprises a signal sequence (SIG, yellow), a leader peptide (LEA, orange), the core peptide region (CPR, purple) and the recognition sequence (REC, blue). Each CPR is flanked by the N-terminal region (NTR, grey) and C-terminal region (CTR, green). The precursor protein undergoes post-translational modification that cyclizes the CPR into the mature cyclized product. In this case, the CPR is a linusorb B1 (LO-B1) domain in which the N-terminal methionine (M) and C-terminal isoleucine (I) are linked to form the structural formula as displayed
Fig. 2
Fig. 2
Multiple sequence alignments of (a) 11 linusorb domains and (b) 5 linusorb precursor proteins. Different regions are shaded in different colors in accordance with the coloring scheme of Fig. 1
Fig. 3
Fig. 3
Phylogenetic trees of (a) 11 linusorb (LO) domains and (b) 5 linusorb precursor proteins. Neighbor-Joining method was used to cluster the sequences aligned by MUSCLE. Numbers in blue above the nodes represent the bootstrap values of 1000 replications. Only nodes with bootstrap values ≥60 are considered significant and have their bootstrap values displayed. Numbers in black below the nodes represent the branch lengths, i.e. genetic distance between two nodes
Fig. 4
Fig. 4
Multiple sequence alignment of repeats identified in the 4 precursor proteins and an extended region of G11-514P containing the single linusorb domain with some flanking residues. Repeats are numbered on the left of the alignment and the name of linusorb domain in each repeat is shown on the right, while repeats containing undetected linusorb-like domains (LLDs) are marked by “?”. Linusorb domain sequences are italicized in the alignment, except LOs E1 – E3, the 3 glycine-containing analogues of LOs B1 – B3, which are underlined. Consensus sites flanking the linusorb(−like) domains are highlighted in bold. The scale on the top marks the starting position (0) of the linusorb domain, and the flanking sites are numbered as minus towards the N-terminus and as plus towards the C-terminus. Colors of amino acids employed the hydropathicity color scheme in which hydrophobic amino acids (YVMCLFIW) are colored black, hydrophilic (RKDENQ) are blue and neutral (SGHTAP) are green
Fig. 5
Fig. 5
Sequence logos of (a) 12 linusorb-embedded repeats of the 5 known linusorb precursor proteins; (b) 58 potential homologues from 8 LLDR proteins retrieved by HMM search (Data S4); (c) 223 non-homologous repeats in 25 LLDR proteins extracted by pattern-matching string search
Fig. 6
Fig. 6
Profiles designed for the search of possible LLDs. Different conserved sites were specified: Profile 1 has all 5 sites specified; Profiles 2–6 alternate one random site for each
Fig. 7
Fig. 7
a Correlation between profile information content and the proportion of protein hits containing strings matching the profile in the predicted proteome and virtual ORF library. b Correlation between profile information content and the ratio of matching strings to protein or ORF hits. Regression model of each data series is shown in the legend and the R2 value is marked next to the regression line
Fig. 8
Fig. 8
Venn diagram displaying 25 LLDR proteins and 4 linusorb precursor proteins identified to contain repetitive motifs matching 6 different profiles. The number of proteins in each field is indicated. Overlapping fields represent proteins shared by more than one profile. The diagram was created by the online tool InteractiVenn
Fig. 9
Fig. 9
Venn diagram displaying different categories of repeat motifs. Total numbers of repeat motifs under different categories are shown in the legends, and numbers in the diagram represent the numbers of repeat motifs inside each isolated area

Similar articles

Cited by

References

    1. Erb M, Kliebenstein DJ. Plant secondary metabolites as defenses, regulators, and primary metabolites: the blurred functional trichotomy. Plant Physiol. 2020;184(1):39–52. doi: 10.1104/pp.20.00433. - DOI - PMC - PubMed
    1. Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep. 2013;30:108–160. doi: 10.1039/C2NP20085F. - DOI - PMC - PubMed
    1. Shim YY, Song Z, Jadhav PD, Reaney MJT. Orbitides from flaxseed (Linum usitatissimum L.): a comprehensive review. Trends Food Sci Technol. 2019;93:197–211. doi: 10.1016/j.tifs.2019.09.007. - DOI
    1. Jing X, Jin K. A gold mine for drug discovery: strategies to develop cyclic peptides into therapies. Med Res Rev. 2020;40:753–810. doi: 10.1002/med.21639. - DOI - PubMed
    1. Craik DJ, Lee MH, Rehm FBH, Tombling B, Doffek B, Peacock H. Ribosomally-synthesised cyclic peptides from plants as drug leads and pharmaceutical scaffolds. Bioorganic Med Chem. 2018;26:2727–2737. doi: 10.1016/j.bmc.2017.08.005. - DOI - PubMed

Substances

LinkOut - more resources