. 2017 Feb 28;91(6):e02096-16.

doi: 10.1128/JVI.02096-16. Print 2017 Mar 15.

Domain Organization and Evolution of the Highly Divergent 5' Coding Region of Genomes of Arteriviruses, Including the Novel Possum Nidovirus

Anastasia Gulyaeva¹, Magdalena Dunowska², Erik Hoogendoorn¹, Julia Giles³, Dmitry Samborskiy⁴, Alexander E Gorbalenya^{5

4

6}

Affiliations

¹ Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
² Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand M.Dunowska@massey.ac.nz a.e.gorbalenya@lumc.nl.
³ Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand.
⁴ Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia.
⁵ Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands M.Dunowska@massey.ac.nz a.e.gorbalenya@lumc.nl.
⁶ Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.

PMID: 28053107
PMCID: PMC5331827
DOI: 10.1128/JVI.02096-16

Domain Organization and Evolution of the Highly Divergent 5' Coding Region of Genomes of Arteriviruses, Including the Novel Possum Nidovirus

Anastasia Gulyaeva et al. J Virol. 2017.

. 2017 Feb 28;91(6):e02096-16.

doi: 10.1128/JVI.02096-16. Print 2017 Mar 15.

Authors

Anastasia Gulyaeva¹, Magdalena Dunowska², Erik Hoogendoorn¹, Julia Giles³, Dmitry Samborskiy⁴, Alexander E Gorbalenya^{5

4

6}

Affiliations

¹ Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
² Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand M.Dunowska@massey.ac.nz a.e.gorbalenya@lumc.nl.
³ Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North, New Zealand.
⁴ Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia.
⁵ Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands M.Dunowska@massey.ac.nz a.e.gorbalenya@lumc.nl.
⁶ Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.

PMID: 28053107
PMCID: PMC5331827
DOI: 10.1128/JVI.02096-16

Abstract

In five experimentally characterized arterivirus species, the 5'-end genome coding region encodes the most divergent nonstructural proteins (nsp's), nsp1 and nsp2, which include papain-like proteases (PLPs) and other poorly characterized domains. These are involved in regulation of transcription, polyprotein processing, and virus-host interaction. Here we present results of a bioinformatics analysis of this region of 14 arterivirus species, including that of the most distantly related virus, wobbly possum disease virus (WPDV), determined by a modified 5' rapid amplification of cDNA ends (RACE) protocol. By combining profile-profile comparisons and phylogeny reconstruction, we identified an association of the four distinct domain layouts of nsp1-nsp2 with major phylogenetic lineages, implicating domain gain, including duplication, and loss in the early nsp1 evolution. Specifically, WPDV encodes highly divergent homologs of PLP1a, PLP1b, PLP1c, and PLP2, with PLP1a lacking the catalytic Cys residue, but does not encode nsp1 Zn finger (ZnF) and "nuclease" domains, which are conserved in other arteriviruses. Unexpectedly, our analysis revealed that the only catalytically active nsp1 PLP of equine arteritis virus (EAV), known as PLP1b, is most similar to PLP1c and thus is likely to be a PLP1b paralog. In all non-WPDV arteriviruses, PLP1b/c and PLP1a show contrasting patterns of conservation, with the N- and C-terminal subdomains, respectively, being enriched with conserved residues, which is indicative of different functional specializations. The least conserved domain of nsp2, the hypervariable region (HVR), has its size varied 5-fold and includes up to four copies of a novel PxPxPR motif that is potentially recognized by SH3 domain-containing proteins. Apparently, only EAV lacks the signal that directs -2 ribosomal frameshifting in the nsp2 coding region.IMPORTANCE Arteriviruses comprise a family of mammalian enveloped positive-strand RNA viruses that include some of the most economically important pathogens of swine. Most of our knowledge about this family has been obtained through characterization of viruses from five species: Equine arteritis virus, Simian hemorrhagic fever virus, Lactate dehydrogenase-elevating virus, Porcine respiratory and reproductive syndrome virus 1, and Porcine respiratory and reproductive syndrome virus 2 Here we present the results of comparative genomics analyses of viruses from all known 14 arterivirus species, including the most distantly related virus, WPDV, whose genome sequence was completed in this study. Our analysis focused on the multifunctional 5'-end genome coding region that encodes multidomain nonstructural proteins 1 and 2. Using diverse bioinformatics techniques, we identified many patterns of evolutionary conservation that are specific to members of distinct arterivirus species, both characterized and novel, or their groups. They are likely associated with structural and functional determinants important for virus replication and virus-host interaction.

Keywords: arterivirus; bioinformatics; comparative genomics; duplication; evolution; nidovirus; papain-like proteases; phylogenetic analysis; ribosome frameshifting; wobbly possum disease.

PubMed Disclaimer

Figures

**FIG 1**
Overview of the target-enriched 5′ RLM RACE protocol. Total RNA extracted from WPD-affected tissues was treated with calf intestine alkaline phosphatase (CIP) to remove free 5′ phosphates from all noncapped nucleic acids, followed by treatment with tobacco acid pyrophosphatase (TAP) to remove the cap structure from full-length mRNA (including capped positive-sense viral RNA), ligation of the RACE adapter to decapped mRNA containing 5′ phosphates, and reverse transcription of the ligated mRNA to cDNA by use of random decamers. These steps were performed according to the manufacturer's instructions (RLM RACE; Invitrogen). The ligated cDNA was then hybridized to biotinylated virus-specific probes. The viral sequences captured on streptavidin-coated magnetic beads were used in the PCR step of the 5′ RLM RACE protocol. The unknown 5′ end was amplified with a selection of virus-specific reverse primers and adapter-specific RACE primers. The target (WPDV) and nontarget (host) nucleic acids are depicted in orange and blue, respectively.

**FIG 2**
Example of the results obtained using modified 5′ RLM RACE as described in the text. (Left) One of the primary PCRs (lane 3) using the RACE.outer/WPD.S5.R primer pair (see Table S1 in the supplemental material) and target-enriched cDNA captured on streptavidin-coated magnetic beads as the template produced three bands, with approximate sizes of 2,500 bp (band 1), 1,500 bp (band 2), and 400 bp (band 3), with no bands in the no-template control (lane 4). Lane 2 represents an unsuccessful 5′ RLM RACE reaction with a different source of starting material. (Right) Nested PCR with the RACE.inner primer and either the WPD.S5.R (lanes 2 to 5) or WPD.S7.R (lanes 6 to 8) reverse primer. DNA extracted from primary band 1 (lanes 2 and 6), band 2 (lanes 3 and 7), band 3 (lanes 4 and 8), or water (lane 5) was used as the template. No bands were visible in the no-template control with the RLM-RACE inner/WPD.S7.R primer pair (not shown in the picture). A DNA ladder (GeneRuler DNA ladder mix; Fermentas) was included in lane 1 of both gels.

**FIG 3**
Phylogeny and nsp1-nsp2 domain organization of arteriviruses. (A) The phylogeny is presented by a posterior sample of phylogenetic trees, reconstructed by BEAST software. The trees are colored blue, red, or green, in descending order of prevalent topology. The genome organization, polyprotein processing scheme, and polyprotein domains used for phylogeny reconstruction (shaded in gray) are detailed in the bottom left corner for PRRSV-2 (accession number NC_001961.1). (B) The domain organization of nsp1-nsp2 is shown for each arterivirus species. Protein domains are represented by colored bars. The bar representing PLP1b of EAV has dark green stripes to emphasize its affinity with PLP1c. Bars representing the PLP1 domains of WPDV have white stripes to show their weak sequence similarity with the PLP1 domains of other arteriviruses. The positions of nsp2 PRF-related motifs are indicated by orange triangles, those of experimentally established cleavage sites by black triangles, and those of PxPxPR motifs by cyan diamonds. (C) Number of genomes sequenced for each of the characterized species (with sampling size and bias).

**FIG 4**
Profile-profile comparisons of nsp1-nsp2 domains of the simian lineage and five other arterivirus lineages. The plots shown are HHalign dot plots, with domains and viruses indicated on the respective axes and alignment paths of the two top-scoring hits drawn with transparent lines. The color of each line indicates the probability of the hit. On the right side of each dot plot, the probability and E value of the top-scoring hit are depicted.

**FIG 5**
Multiple-sequence alignments of selected nsp1-nsp2 domains of arteriviruses. (A) MSA of ZnF domains. Zinc-binding residues are marked with black triangles. (B) MSA of “nuclease” domains. Columns of the MSA that contain PRRSV-2 nsp1b residues whose mutation to alanine led to abolishment of PRRSV-2 nsp1b nuclease activity (42) are marked with black triangles. (C) MSA of PLP2 domains. Catalytic residues are marked with black triangles. MSAs were visualized with the help of Espript 2.1 (82). Secondary structures were derived from PDB entries.

**FIG 6**
Sequence similarity and evolutionary relationships of PLP1b and PLP1c. (A) HHalign comparisons between PLP1b and PLP1c domains of different arteriviruses. For each comparison, a dot plot is shown. On the dot plot, the alignment path of the top-scoring hit is drawn with a transparent line. The color of the line indicates the probability of the hit. Below the dot plot, the probability and E value of the top-scoring hit are given. (B) Posterior sample of phylogenetic trees generated by BEAST, based on MSA of PLP1b and PLP1c. For other designations, see the legend to Fig. 3A.

**FIG 7**
HHalign profile-profile comparisons of nsp1-nsp2 domains of WPDV and non-WPDV arteriviruses. EAV PLP1b was regarded as PLP1c for this figure. For details, see the legend to Fig. 4.

**FIG 8**
Rank distribution of top HHalign hits between PLP1 active site motifs of arteriviruses and WPDV pp1ab. HMM profiles representing cysteine and histidine motifs of PLP1s of all non-WPDV arterivirus species, with the EAV PLP1a cysteine motif excluded, were compared with WPDV pp1ab. The 15 top hits were ranked in descending order of probability (indicated on the y axis). Hits potentially including the catalytic cysteines of WPDV PLP1b and PLP1c are designated Cb and Cc, respectively.

**FIG 9**
Multiple-sequence alignment of arterivirus nsp1 PLPs. The top two secondary structures were derived from PDB entries. All other secondary structures were predicted by Jpred4 (74). Red triangles indicate columns of the PLP1a and PLP1b/PLP1c MSAs that have conservation scores above 0.75 for non-WPDV arteriviruses and were mapped on PDB structures (see Fig. 11). Columns containing the first residues of the PRRSV-2 PLP1a and PLP1b C-terminal subdomains are indicated by ochre bars. Catalytic motifs of nsp1 PLPs are underlined in cyan. The MSAs were visualized with Espript 2.1 (82).

**FIG 10**
Distribution of sequence conservation in the N-terminal region of pp1ab of arteriviruses. (A) MSAs of nsp1 PLP motifs of all non-WPDV arteriviruses are depicted as logos, with the homologous WPDV sequence specified below each logo. PLP motifs, including the catalytic residues Cys (C) and His (H) and putative RNA-binding residues (R), are labeled with domain-specific suffixes. Logos were prepared with the R package RWebLogo 1.0.3 (83). (B) The conservation profile, calculated based on the MSA of sequences from non-WPDV clusters, is shown for each domain of nsp1 and the N-terminal domains of nsp2. Areas above and below the mean conservation lines are shaded in black and gray, respectively. Dotted red lines indicate the mean conservation of the domains after the addition of the WPDV sequence to the MSA. EAV PLP1b was regarded as PLP1c for this figure.

**FIG 11**
Subdomain-specific distribution of residues conserved in PLP1a and PLP1b/c. The structures shown are tertiary structures of PRRSV-2 PLP1a (A) and PLP1b (B) with residues conserved in all non-WPDV arteriviral PLP1a and PLP1b/PLP1c domains, respectively. The N-terminal subdomain, formed by α-helices, is shown in cyan; and the C-terminal subdomain, consisting of antiparallel β-strands, is shown in blue. Conserved residues are shown in yellow (catalytic dyad) and red (all the rest). The following residues were conserved in the PLP1a alignment and mapped on PRRSV-2 (accession number EU624117.1) nsp1a: left subdomain, Gly45, Cys76, and Gly109; and right subdomain, Pro134, Tyr141, His146, Phe152, Ala155, and Pro175. The following residues were conserved in the PLP1b/c alignment and mapped on PRRSV-2 (accession number EU624117.1) nsp1b: left subdomain, Gly88, Cys90, Trp91, Leu94, Ala110, Gly120, Gly123, Tyr125, and Leu126; and right subdomain, Gly143, His159, Leu160, and Gly203. The figure was prepared with PyMOL (85).

**FIG 12**
Conservation of PxPxPR motifs in the HVR of arteriviruses. (A) Rank distribution of the top 30 hits obtained during HHalign comparison between WPDV HVR tandem repeats and individual HVR domain sequences of arteriviruses. The red line depicts the 5% probability threshold. WPDV HVR tandem repeats identified by RADAR are shown in the top right corner. (B) Locations of motifs identified by MEME in the HVR of arterivirus species. Extended PxPxPR motifs are shown in green, and conserved C-terminal motifs corresponding to the nsp2 PRF site are shown in red. (C) MSA of the PxPxPR motif and its derivatives in the HVR of viruses representing arterivirus species. Coordinates in the names of motifs refer to their domain position. Numbers to the right of the MSA show support for the identification of each motif by three methods. The first column shows probability values assigned to hits containing PxPxPR motifs by HHalign in analyses comparing HVR sequences of the respective arteriviruses to the MSA of tandem repeats of the WPDV HVR. The second column shows P values assigned to motifs by MEME. The third column shows matches (+) and mismatches (−) of the PxPxPR pattern.

**FIG 13**
Multiple-sequence alignment of the nsp2 C termini of arteriviruses. Columns containing amino acids whose tRNAs are expected to be present in the ribosomal P and A sites prior to −1/−2 frameshifting are marked with orange triangles. The first column of the TM1-CR domains is marked with a black box. Amino acid residues predicted by TMHMM 2.0 (75) to form transmembrane regions are colored blue. The MSA was visualized with Espript 2.1 (82).

**FIG 14**
Arteriviral nsp2 PRF. (A) Schematic representation of the expression of nsp2 moieties (based on LDV; accession number U15146.1). (B) Fragment of the pp1ab alignment corresponding to the site of nsp2 PRF. Columns containing amino acids whose tRNAs are present in the ribosomal P and A sites prior to frameshifting are highlighted with orange triangles. (C) Nucleotide alignment corresponding to the protein alignment presented in panel B. The slippery sequence is shown in orange and the C-rich element in cyan. Deviations from the canonical motifs, i.e., RG_GUU_UUU (R = G or A) and CCCANCUCC, are highlighted in red. For each sequence, the genome coordinate of the first nucleotide in the alignment is specified. If the frameshift site allows complete A-site duplex repairing in the −1 or −2 frame, then the length of the corresponding hypothetical protein product is specified. Otherwise, it is marked with a dash. Alignment columns containing the first nucleotides of −1TF and −2TF are highlighted with pink and blue bars, respectively.

**FIG 15**
Multiple-sequence alignments of alternative nsp2 C termini. (A) C terminus of nsp2N, translated as a result of −1 PRF. (B) C terminus of nsp2TF, translated as a result of −2 PRF. MSAs were guided by the MSA presented in Fig. 13. For other details, see the legend to Fig. 13.

See this image and copyright information in PMC

References

1. Snijder EJ, Kikkert M, Fang Y. 2013. Arterivirus molecular biology and pathogenesis. J Gen Virol 94:2141–2163. doi: 10.1099/vir.0.056341-0. - DOI - PubMed
1. Faaberg KS, Balasuriya UB, Brinton MA, Gorbalenya AE, Leung FC-C, Nauwynck H, Snijder EJ, Stadejek T, Yang H, Yoo D. 2012. Family Arteriviridae, p 796–805. In King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ (ed), Virus taxonomy: classification and nomenclature of viruses. Ninth report of the International Committee on Taxonomy of Viruses. Academic Press, London, United Kingdom.
1. Adams MJ, Lefkowitz EJ, King AM, Harrach B, Harrison RL, Knowles NJ, Kropinski AM, Krupovic M, Kuhn JH, Mushegian AR, Nibert M, Sabanadzovic S, Sanfacon H, Siddell SG, Simmonds P, Varsani A, Zerbini FM, Gorbalenya AE, Davison AJ. 2016. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2016). Arch Virol 161:2921–2949. doi: 10.1007/s00705-016-2977-6. - DOI - PMC - PubMed
1. Balasuriya UB, Snijder EJ, Heidner HW, Zhang J, Zevenhoven-Dobbe JC, Boone JD, McCollum WH, Timoney PJ, MacLachlan NJ. 2007. Development and characterization of an infectious cDNA clone of the virulent Bucyrus strain of equine arteritis virus. J Gen Virol 88:918–924. doi: 10.1099/vir.0.82415-0. - DOI - PubMed
1. den Boon JA, Snijder EJ, Chirnside ED, de Vries AA, Horzinek MC, Spaan WJ. 1991. Equine arteritis virus is not a togavirus but belongs to the coronaviruslike superfamily. J Virol 65:2910–2920. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Domain Organization and Evolution of the Highly Divergent 5' Coding Region of Genomes of Arteriviruses, Including the Novel Possum Nidovirus

Affiliations

Domain Organization and Evolution of the Highly Divergent 5' Coding Region of Genomes of Arteriviruses, Including the Novel Possum Nidovirus

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials