Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 25;10(1):5470.
doi: 10.1038/s41598-020-62328-w.

Viruses with different genome types adopt a similar strategy to pack nucleic acids based on positively charged protein domains

Affiliations

Viruses with different genome types adopt a similar strategy to pack nucleic acids based on positively charged protein domains

Rodrigo D Requião et al. Sci Rep. .

Abstract

Capsid proteins often present a positively charged arginine-rich sequence at their terminal regions, which has a fundamental role in genome packaging and particle stability for some icosahedral viruses. These sequences show little to no conservation and are structurally dynamic such that they cannot be easily detected by common sequence or structure comparisons. As a result, the occurrence and distribution of positively charged domains across the viral universe are unknown. Based on the net charge calculation of discrete protein segments, we identified proteins containing amino acid stretches with a notably high net charge (Q > + 17), which are enriched in icosahedral viruses with a distinctive bias towards arginine over lysine. We used viral particle structural data to calculate the total electrostatic charge derived from the most positively charged protein segment of capsid proteins and correlated these values with genome charges arising from the phosphates of each nucleotide. We obtained a positive correlation (r = 0.91, p-value <0001) for a group of 17 viral families, corresponding to 40% of all families with icosahedral structures described to date. These data indicated that unrelated viruses with diverse genome types adopt a common underlying mechanism for capsid assembly based on R-arms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Viral proteins are enriched with positively charged stretches. Protein sequences derived from the reviewed Swiss-Prot data-bank (560,659 proteins) were used as input for a program that calculates the net charge of every consecutive 30 amino acid residues (Q30res). (A) Numerical proportion of the protein entries and the calculated 30 residues fragments from the 4 domains of life: viruses (16,866 proteins, 7.2 × 106 fragments), Eukaryota (190,054 proteins, 7.8 × 107 fragments), Bacteria (334,178 proteins, 9.4 × 107 fragments), and Archaea (19,561 proteins, 5.1 × 106 fragments). (B) The upper panel shows the normalized net-charge frequency distribution of protein segments from the four domains of life, and the lower panel shows the observed vs. expected net-charge frequency of each protein group shown in B in relation to the total Swiss-Prot proteome. The statistical enrichment analysis is shown in a heatmap (inset), where significant p-values are shown in shades of green to red. Grey slots represent p-values > 0.05. (C) Observed vs. expected net-charge frequency plot comparing viruses and the proteome of individual multicellular organisms: Drosophila melanogaster (3,279 proteins), Arabidopsis thaliana (14,430 proteins), and H. sapiens (20,214 proteins).
Figure 2
Figure 2
Capsid proteins from icosahedral viruses concentrate most of the positively charged protein segments of the viral proteome. Protein sequences derived from the reviewed Swiss-Prot data bank were used as input for a program that calculates the net charge of every consecutive 30 amino acid residues (Q30res). The observed vs. expected frequency of fragments net charge from a specific protein functional class in relation to the Swiss-Prot proteome. (A) The viral protein data set were divided into three different functional categories: viral polymerase (containing all different kinds of viral polymerases; 1,212 proteins); nucleic acid-binding (containing viral transcriptional/translational regulators, RNAi suppressors; 3,202 proteins); and viral particle (containing structural proteins present in viral particles; 1,902 proteins). (B) The viral particle data set was further divided into three different functional categories: Viral envelope (containing mainly glycoproteins; 808 proteins); Viral helical capsid (containing mainly nucleocapsid proteins from helical viruses; 232 proteins); and Viral icosahedral capsid (containing mainly capsid proteins from spherical viruses; 762 proteins). (C) The Viral icosahedral capsid dataset and the Viral nucleic acid binding dataset were compared to the Human nucleic acid binding data set (containing RNA and DNA binding proteins with diverse functional roles; 4,073 proteins). The statistical enrichment analysis is shown by a heatmap, where significant p-values are represented in shades of green to red. Grey slots represent p-values > 0.05.
Figure 3
Figure 3
The entire capsid internal net charge calculated from the most positively charged capsid protein segment correlates with genome packing capacity. The maximum net-charge value found in a 30 amino acid residue stretch was multiplied by the number of subunits forming the capsid (Total Qmax30res) of 179 viruses from 29 different families (see also S4). The total nucleic-acid net charge was calculated from the number of nucleotide residues in the genome (Qgenome). For multipartite viruses, the longest genome segment was considered for the plot. (A) Scatter plot colored by T number. Circles and squares represent eukaryotic viruses and bacteriophages, respectively. The blue circle highlights the cluster formed by bacteriophages from the Podoviridae, Siphoviridae, and Myoviridae families. (B) Eukaryotic viruses and the bacteriophages Leviviridae and Microviridae were used to calculate a straight line fit (n = 133). The shaded area indicates families with outliers (ROUT 5%). Pearson correlation results obtained from the inliers (103) are shown in the inset. Data points contoured in red represent viruses that have more Lys than Arg in their positively charged segments (see also Fig. 5A). (C,D) Show the Qmax30res values per protein fragment and Qgenome values according to capsid T number, respectively. T1* corresponds to the T1 geminated capsids from Geminiviridae (110 subunits) and the dsRNA T1 capsids formed by dimeric subunits (120 subunits). Grey data points in (C,D) correspond to the outliers identified in panel A, as follows T1: ssDNA Parvoviridae; T1*: dsRNA Totiviridae and Partitiviridae; T3: all (+)RNA outliers Caliciviridae, Dicistroviridae, Secoviridae, Picornaviridae, and Tymoviridae. Error bars indicate the mean and SD values. Tukey’s p-values **** < 0.0001, *0.035.
Figure 4
Figure 4
Automatic identification of positively charged domains with variable sizes. We designed and implemented an algorithm that calculates the net charge and the charge concentration Qc = Q/frame size) in incremental frame sizes. The positively charged domain was defined as being the stretch with the highest Q value and Qc ≥ 0.23. (A) Frequency distribution of the domain sizes retrieved by the variable frame program using the same 133 viruses analyzed in Fig. 3B as the input. The inset shows the Pearson correlation between the Total Qmax30res calculated using the fixed frame program and the Total Qmaxnres calculated using the variable frame program. (B) Examples of sequences found by the variable frame (marked in red) and fixed 30 residues frame program (marked in blue).
Figure 5
Figure 5
Composition and location of positively charged domains from viral capsid proteins: (A) The arginine and lysine residues of fragments with Q30res ≥ 0 of 1,100 capsid proteins from 49 virus families were calculated. The log2 R/K ratio per net charge value is shown as a heatmap, ranging from red (K-enriched) to blue (R-enriched). Groups 1 and 2 (grey box) contain the icosahedral viruses shown in Fig. 3A; the latter corresponds to families identified as outliers. Group 3 contains bacteriophages and complex multicomponent icosahedral capsids that were not analyzed in Fig. 3. From this plot, we see that all groups included in the linear fit of Fig. 3A had at least one segment with Q30res ≥ +7 (dashed line). (B) A heatmap indicates the frequency values of fragments with Q30res ≥ +7 according to their position in the primary structure. The protein lengths were normalized and divided into bins of 0.01. (C) The sequence of fragments with Qmax30res ≥ +7 from viral capsid proteins (group 1 panel A) and human nucleic acid-binding proteins was used to determine the amino acid composition of positively charged segments. The panel shows the amino acid enrichment in relation to the total Swiss-Prot proteome amino acid composition.
Figure 6
Figure 6
The proportion of proteins containing positively charged segments in the Swiss-Prot database. Protein sequences derived from the reviewed Swiss-Prot data-bank were used as input for a program that calculates the net charge of every consecutive 30 residue amino acid segments (Q30res). The arginine and lysine residues of fragments with Q30res ≥ 7 or + ≥ 14 were determined. Proteins containing at least one segment with Q30res ≥ +7 or ≥+14 with R/K ≥ 4 or K/R ≥ 4 were listed according to the organism or function.

References

    1. Aksyuk AA, Rossmann MG. Bacteriophage assembly. Viruses. 2011;3:172–203. doi: 10.3390/v3030172. - DOI - PMC - PubMed
    1. Caspar DL, Klug A. Physical principles in the construction of regular viruses. Cold Spring Harb. Symp. Quant. Biol. 1962;27:1–24. doi: 10.1101/SQB.1962.027.001.005. - DOI - PubMed
    1. Johnson JE, Speir JA. Quasi-equivalent viruses: a paradigm for protein assemblies. J. Mol. Biol. 1997;269:665–675. doi: 10.1006/jmbi.1997.1068. - DOI - PubMed
    1. Perlmutter JD, Hagan MF. Mechanisms of virus assembly. Annu. Rev. Phys. Chem. 2015;66:217–239. doi: 10.1146/annurev-physchem-040214-121637. - DOI - PMC - PubMed
    1. Newman M, Chua PK, Tang FM, Su PY, Shih C. Testing an electrostatic interaction hypothesis of hepatitis B virus capsid stability by using an in vitro capsid disassembly/reassembly system. J. Virol. 2009;83:10616–10626. doi: 10.1128/JVI.00749-09. - DOI - PMC - PubMed

Publication types