Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;4(6):e716.
doi: 10.1371/journal.pntd.0000716.

New assembly, reannotation and analysis of the Entamoeba histolytica genome reveal new genomic features and protein content information

Affiliations

New assembly, reannotation and analysis of the Entamoeba histolytica genome reveal new genomic features and protein content information

Hernan A Lorenzi et al. PLoS Negl Trop Dis. .

Abstract

Background: In order to maintain genome information accurately and relevantly, original genome annotations need to be updated and evaluated regularly. Manual reannotation of genomes is important as it can significantly reduce the propagation of errors and consequently diminishes the time spent on mistaken research. For this reason, after five years from the initial submission of the Entamoeba histolytica draft genome publication, we have re-examined the original 23 Mb assembly and the annotation of the predicted genes.

Principal findings: The evaluation of the genomic sequence led to the identification of more than one hundred artifactual tandem duplications that were eliminated by re-assembling the genome. The reannotation was done using a combination of manual and automated genome analysis. The new 20 Mb assembly contains 1,496 scaffolds and 8,201 predicted genes, of which 60% are identical to the initial annotation and the remaining 40% underwent structural changes. Functional classification of 60% of the genes was modified based on recent sequence comparisons and new experimental data. We have assigned putative function to 3,788 proteins (46% of the predicted proteome) based on the annotation of predicted gene families, and have identified 58 protein families of five or more members that share no homology with known proteins and thus could be entamoeba specific. Genome analysis also revealed new features such as the presence of segmental duplications of up to 16 kb flanked by inverted repeats, and the tight association of some gene families with transposable elements.

Significance: This new genome annotation and analysis represents a more refined and accurate blueprint of the pathogen genome, and provides an upgraded tool as reference for the study of many important aspects of E. histolytica biology, such as genome evolution and pathogenesis.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Re-mapping strategy to transfer old annotation
A) Steps followed to achieve the full mapping of OGA (9,846 gene models) into the new E. histolytica assembly, resulting in NGA (8,201 gene models). B) Mapping of the OGA gene models fell into different categories: genes with perfect map to new assembly (same structure), genes that map to a location but have to be modified (different structure), genes that mapped to a repeat (discarded), genes smaller than 100 amino acids (discarded if they had no evidence), genes that fell within tandem duplications (discarded), and other smaller categories (pseudogenes, truncated genes).
Figure 2
Figure 2. Structural annotation improvement in the new E. histolytica assembly.
Comparative analysis of Pfam HMM searches statistics between equivalent genes in the old and new annotation. Blue bars, genes that have better statistics/hits in the old annotation compared to the new annotation; orange bars, old and new annotation genes give exactly the same result; yellow bars, number of genes from the new annotation with better statistics/hits compared to their counterparts in the old annotation.
Figure 3
Figure 3. E. histolytica protein families.
A) Size distribution of protein families. B) Functional assignments in Singletons (proteins not assigned to families) versus Proteins within Families. Hypothetical: predicted hypothetical proteins; Non-hypothetical: predicted proteins with functional assignments; Expressed: predicted proteins with EST (expressed sequence tag) support.
Figure 4
Figure 4. Entamoeba histolytica segmental genome duplications.
A) D1-type duplications flanked by unique 2.3 kb inverted repeats (IR), B) D2-type duplications flanked by EhERE1/EhLINE2- derived 1.2 kb IRs, C) D3-type duplications usually associated to EhLINE1, but lacking IRs, and D) D4-type duplications present in the vicinity of TE elements, and lacking IRs. Inverted red arrows: IRs; purple boxes: open reading frames; blue boxes: repetitive elements; DS identifiers correspond to GenBank accession numbers for the corresponding scaffolds.

Similar articles

Cited by

References

    1. Ximenez C, Moran P, Rojas L, Valadez A, Gomez A. Reassessment of the epidemiology of amebiasis: state of the art. Infect Genet Evol. 2009;9:1023–1032. - PubMed
    1. Loftus B, Anderson I, Davies R, Alsmark UC, Samuelson J, et al. The genome of the protist parasite Entamoeba histolytica. Nature. 2005;433:865–868. - PubMed
    1. Sehgal D, Mittal V, Ramachandran S, Dhar SK, Bhattacharya A, et al. Nucleotide sequence organisation and analysis of the nuclear ribosomal DNA circle of the protozoan parasite Entamoeba histolytica. Mol Biochem Parasitol. 1994;67:205–214. - PubMed
    1. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. - PMC - PubMed
    1. Roberts M, Hunt BR, Yorke JA, Bolanos RA, Delcher AL. A preprocessor for shotgun assembly of large genomes. J Comput Biol. 2004;11:734–752. - PubMed

Publication types

Substances

LinkOut - more resources