Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Sep;39(5):764-78.
doi: 10.1093/femsre/fuv031. Epub 2015 Jul 14.

Ebolavirus comparative genomics

Affiliations
Review

Ebolavirus comparative genomics

Se-Ran Jun et al. FEMS Microbiol Rev. 2015 Sep.

Abstract

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords: Ebola; Ebola virus disease (EVD); Filovirus; comparative genomics; epitope prediction; viral genomes.

PubMed Disclaimer

Figures

Graphical Abstract Figure.
Graphical Abstract Figure.
Variation within Ebola genomes is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L); genomic conservation and epitope prediction, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.
Figure 1.
Figure 1.
A dendrogram of all viral genomes from RefSeq complemented with, in red, multiple genomes in the family Filoviridae extracted from GenBank. The dendrogram is constructed by the FFP method which is based on K-mers, with K set to 9, in the moderate range of optimal feature length based on analysis of cumulative relative entropy and relative sequence divergence as described previously in Wu et al. (2009).
Figure 2.
Figure 2.
A maximum likelihood tree based on complete genomes of the three filovirus genera. The three genera Marburgvirus, Cuevavirus and Ebolavirus are separated and genus Ebolavirus is further split into species, indicated on the right. The tree was produced with PhyML (Guindon et al. 2010) with the GTR + I + G nucleotide substitution model to a multiple sequence alignment of complete genome sequences by MAFFT (Katoh and Standley 2013). The best substitution model was identified by jModelTest (Guindon and Gascuel ; Darriba et al. 2012) among a broad suite of evolutionary models based on BIC. The numeric values represent the number of members within the clades.
Figure 3.
Figure 3.
Gene organization of filovirus genomes and pan-core proteome analysis. (A) Gene organization of viruses from genera Marburgvirus, Ebolavirus and Cuevavirus. Conserved regions are indicated by color, with striped color for weak conservation. The furin cleavage site for glycoprotein GP is indicated by the inverted Y-shaped symbol. Gray blocks below the line indicate the position of conserved functional domains (predicated by InterProScan; Jones et al. 2014). (B) Venn diagram summarizing the results of the pan-core analysis based on protein sequence alignments with a cut-off based on the 50–50 rule (see the text). (C) Venn diagram of the protein functional domains resulting from pan-core analysis based on HMM.
Figure 4.
Figure 4.
A maximum likelihood tree of 60 Zaire ebolavirus genomes. The tree was produced with the GTR + G model rooted by a clade of the 1976 outbreaks. All isolates were of human origin, with the exception of two isolates from the 1976 outbreak (AF499101 was mouse adapted, EU224440 was from a Guinea pig). The asterisk identifies the DRC 2014 isolate within a clade of 1994–1996 isolates from Gabon. The numbers on the major internal branches represent bootstrap support (%) out of 100 replicates. Abbreviations: DRC, Democratic Republic of Congo; GAB, Gabon; GIN, Guinea; SLE, Sierra Leone.
Figure 5.
Figure 5.
Atlas of the genome of ebolavirus KJ660347, showing, from the outer ring inwards, variations within 84 other ebolavirus genomes, structural cruciforms and palindromes (van Noort et al. 2003), the coding sequences, local inverted repeats, palindromic hexamers, simple repeats and AT content. The conservation percentage (%) is defined as the number of genomes with the same letter on a multiple sequence alignment normalized to range from 0 to 100% for each site along the chromosome of Ebola KJ660347.
Figure 6.
Figure 6.
Multiple sequence alignment of a portion of the ebolavirus glycoprotein (GP) from four species of ebolavirus genomes, showing identities between the Taï Forest genome (gi|208436395) and three others (numbered using the Zaire ebolavirus genome of gi|208436395). Identities between Taï Forest and others are shaded in gray. At each position of the alignment, the genome with the highest identity to Taï Forest GP is shown above the alignment by color and the first letter of the genome type: Z = Zaire, gi|667853009 (green), S = Sudan, gi|165940954 (cyan), and R = Reston, gi|253317719 (yellow). The highest similarity at each position was determined by the largest number of identities in a five-residue window centered at each location, with dashes indicating a tie or an undetermined result. Dashes between blocks of the same letter are colored by the surrounding color.
Figure 7.
Figure 7.
Ebolavirus trees comparison. This is an image plot of branch score distances between alignment-based trees constructed by different features including whole genome alignment (WGA), coding gene sequences (CDS), intergenic (IG) sequences, protein sequences and three alignment-free-based whole genome trees by FFP, NRD and RPD.
Figure 8.
Figure 8.
Experimentally verified B-cell epitopes for Ebola GP protein based on selected studies (Wilson et al. ; Shahhosseini et al. ; Takada et al. ; Lee et al. ,; Bale et al. ; Becquart et al. ; Qiu et al. ; Wang et al. 2014), represented by colored bars in the GP schematic. Glycan sites are also indicated. Some mapped data, where the epitope was not well localized within 50 amino acids, were omitted.
Figure 9.
Figure 9.
Position of 10 predicted MHC class I (red) and 10 class II (blue) epitopes in six ebolavirus proteins, and the allelic variation detected in the 53 non-redundant proteomes. Sequence variation that destroys a predicted epitope is shown in red, while all variants shown in green were equally strong or only marginally less strong, compared to the sequences shown in black. Gray blocks above the proteins indicate the position of experimentally proven B-cell epitopes, after Becquart et al. (2014).

Similar articles

Cited by

References

    1. Adu-Gyamfi E, Soni SP, Jee CS, et al. A loop region in the N-terminal domain of Ebola virus VP40 is important in viral assembly, budding, and egress. Viruses. 2014;6:3837–54. - PMC - PubMed
    1. Audet J, Kobinger G. Immune evasion in Ebolavirus infections. Viral Immunol. 2014;28:10–8. - PubMed
    1. Audet J, Wong G, Wang H, et al. Molecular characterization of the monoclonal antibodies composing ZMAb: a protective cocktail against Ebola virus. Sci Rep. 2014;4:6881. - PMC - PubMed
    1. Bale S, Dias JM, Fusco ML, et al. Structural basis for differential neutralization of Ebolaviruses. Viruses. 2012;4:447–70. - PMC - PubMed
    1. Becquart P, Mahlakoiv T, Nkoghe D, et al. Identification of continuous human B-cell epitopes in the VP35, VP40, nucleoprotein and glycoprotein of Ebola virus. PloS One. 2014;9:e96360. - PMC - PubMed

Publication types