Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 18;12(1):936.
doi: 10.1038/s41598-022-04976-8.

Two short low complexity regions (LCRs) are hallmark sequences of the Delta SARS-CoV-2 variant spike protein

Affiliations

Two short low complexity regions (LCRs) are hallmark sequences of the Delta SARS-CoV-2 variant spike protein

Arturo Becerra et al. Sci Rep. .

Abstract

Low complexity regions (LCRs) are protein sequences formed by a set of compositionally biased residues. LCRs are extremely abundant in cellular proteins and have also been reported in viruses, where they may partake in evasion of the host immune system. Analyses of 28,231 SARS-CoV-2 whole proteomes and of 261,051 spike protein sequences revealed the presence of four extremely conserved LCRs in the spike protein of several SARS-CoV-2 variants. With the exception of Iota, where it is absent, the Spike LCR-1 is present in the signal peptide of 80.57% of the Delta variant sequences, and in other variants of concern and interest. The Spike LCR-2 is highly prevalent (79.87%) in Iota. Two distinctive LCRs are present in the Delta spike protein. The Delta Spike LCR-3 is present in 99.19% of the analyzed sequences, and the Delta Spike LCR-4 in 98.3% of the same set of proteins. These two LCRs are located in the furin cleavage site and HR1 domain, respectively, and may be considered hallmark traits of the Delta variant. The presence of the medically-important point mutations P681R and D950N in these LCRs, combined with the ubiquity of these regions in the highly contagious Delta variant opens the possibility that they may play a role in its rapid spread.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
LCRs in VOCs (Alpha, Beta, Gamma and Delta variants), VOIs (Epsilon, Eta, Iota, Kappa and Lambda variants) and Other SARS-CoV-2 proteomes (Others). (a) LCRs present in ORF1ab, which includes nsp1-4, 3CL protease (3CL), nsp6-nsp11, RdRp polymerase (RdRp), Helicase (Hel) and nsp 14-16. (b) LCRs along spike, ORF3a (3a), envelope (E), membrane (M), ORFs 6, 7a, 7b, 8, nucleocapsid (N) and ORF10. The Spike LCR sequences reported here in the Delta, Iota and Kappa variants are represented with red lines. The width of the lines is not proportional to the number of sequences in each variant.
Figure 2
Figure 2
Complexity of the spike proteins of VOIs, VOCs and other SARS-CoV-2. The x axis shows the number of amino acid residues and the y axis shows the complexity level. A) The complexity level of each variant is in a different color: Iota—blue; Delta—dark red; Kappa—green. B) Complexity of the Delta spike proteins. The complexity level of the subsets is in a different color: Delta sensu lato—dark red; Variants AY—salmon pink; lineage B.617.2—bright red.
Figure 3
Figure 3
(a) The SARS-CoV-2 spike protein three-dimensional structure (by cryo-electron microscopy, PDB code: 7BNM). The structure corresponds to a trimer, where each monomer is represented with a different color. The two subunits that make up each monomer (Subunit 1, also known as Head region, and Subunit 2, or the Stalk region) are indicated. (b) Domain organization of the spike protein. The position of each of the LCRs found in this work, together with the mutations present in each variant spike protein are shown. The position of the signal peptide (SP) and Spike LCR-1 are indicated. The green arrow in the Spike LCR-3 box indicates the furin cleavage site. (c) Monomer of the spike protein. Close ups of each of the structural regions corresponding to the different LCRs are shown in colored boxes. The sequences of each LCR are represented, with the mutations indicated with a red letter. Protein structures in panels a and c were rendered using PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.). Panel b was created with BioRender.com. Abbreviations: SP, signal peptide; NTD, N-terminal domain; RBD, receptor-binding domain; RBM, receptor-binding motif; SD1, subdomain 1; SD2, subdomain 2; FP, fusion peptide; HR1, heptad repeat 1; CH, central helix; CD, connector domain; HR2, heptad repeat 2; TM, transmembrane domain.

References

    1. Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: Not just “junk” peptide sequences. Genome. 2010;53:753–762. doi: 10.1139/g10-063. - DOI - PubMed
    1. Mier P, et al. Disentangling the complexity of low complexity proteins. Brief Bionform. 2020;21:458–472. doi: 10.1093/bib/bbz007. - DOI - PMC - PubMed
    1. Ntountoumi C, et al. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res. 2019;47:9998–10009. doi: 10.1093/nar/gkz730. - DOI - PMC - PubMed
    1. Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. Adv. Protein Chem. Str. 2010;79:59–88. doi: 10.1016/S1876-1623(10)79002-7. - DOI - PubMed
    1. Kajava AV. Tandem repeats in proteins: From sequence to structure. J. Struct. Biol. 2012;179:279–288. doi: 10.1016/j.jsb.2011.08.009. - DOI - PubMed

Publication types

Supplementary concepts