Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul;19(7):1202-13.
doi: 10.1101/gr.083386.108. Epub 2009 Apr 10.

The origins of apicomplexan sequence innovation

Affiliations

The origins of apicomplexan sequence innovation

James Wasmuth et al. Genome Res. 2009 Jul.

Abstract

The Apicomplexa are a group of phylogenetically related parasitic protists that include Plasmodium, Cryptosporidium, and Toxoplasma. Together they are a major global burden on human health and economics. To meet this challenge, several international consortia have generated vast amounts of sequence data for many of these parasites. Here, we exploit these data to perform a systematic analysis of protein family and domain incidence across the phylum. A total of 87,736 protein sequences were collected from 15 apicomplexan species. These were compared with three protein databases, including the partial genome database, PartiGeneDB, which increases the breadth of taxonomic coverage. From these searches we constructed taxonomic profiles that reveal the extent of apicomplexan sequence diversity. Sequences without a significant match outside the phylum were denoted as apicomplexan specialized. These were collated into 9134 discrete protein families and placed in the context of the apicomplexan phylogeny, identifying the putative origin of each family. Most apicomplexan families were associated with an individual genus or species. Interestingly, many genera-specific innovations were associated with specialized host cell invasion and/or parasite survival processes. Contrastingly, those families reflecting more ancestral relationships were enriched in generalized housekeeping functions such as translation and transcription, which have diverged within the apicomplexan lineage. Protein domain searches revealed 192 domains not previously reported in apicomplexans together with a number of novel domain combinations. We highlight domains that may be important to parasite survival.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Apicomplexan life cycles. Members of the Apicomplexa share a generalized life cycle, though each species has its own specializations. Plasmodium spp. and Theileria spp. are transmitted and undergo sexual recombination in an insect vector, the Anopheles mosquito and Rhipicephalus tick, respectively. Cryptosporidium is able to autoinfect its host; the oocyst can sporulate and excyst in the same host, maintaining the infection for months to years. The Coccidian parasites are represented in this figure by Toxoplasma, which is able to infect the majority of warm-blooded animals. The differentiation of Toxoplasma tachyzoites into gametocytes is triggered only when members of the cat family (Felidae) are infected. The molecular basis for this regulation is not yet known. The intermediate and definitive host spectrum for each species under consideration in this study are given in Table 1.
Figure 2.
Figure 2.
Apicomplexan-specialized protein families. This figure shows the putative point of origin for each protein family within the Apicomplexa. The species membership for each protein family was used to determine the putative point of origin of protein families within the apicomplexan phylogeny. The number of protein families shared between at least two daughter taxa of a particular clade is circled at each node. On the terminal branches is the number of species-specific protein families, where a family contains at least two proteins. The number of singletons (not clustered in a family), total number of phylum-restricted proteins, and total number of proteins available are given after the species name. Whether a complete or partial genome is available for the species is designated with a Cg or Pg, respectively. The tree is a cladogram, and the construction of the phylogeny is described in the Methods section.
Figure 3.
Figure 3.
Species distribution of sequences in a selection of Apicomplexan-specialized protein families. Three sets of protein families are presented: (A) the top 20 most-abundant families; (B) the top 15 most-abundant families with sequence representatives from the 173 families conserved between the Cryptosporidium and either or both of the Aconoidasida and the Coccidia; and (C) the top 15 most-abundant families, with sequence representatives from the 427 families conserved between the Aconoidasida and the Coccidia. Numbers in boxes indicate the number of sequences from each species associated with each family. A black border for a box shows that proteomic data can be mapped to the protein family. The colors refer to the clades in Figure 2.
Figure 4.
Figure 4.
Proposed evolution of family 35_3.0. The multiple sequence alignment and phylogenetic reconstruction is described in the Methods section. There were two instances where gene duplication occurred, along with evidence for subsequent gene loss in one lineage. The incomplete nature of partial genomes makes interpretation of gene loss difficult, but has less effect when considering duplication events. Bootstrap values are given at the nodes and are a percentage of 1000 replicated multiple sequence alignments. Nodes supported by <50% of replicates are shown as a polytomy.

References

    1. Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori Z, et al. The complete genome sequence of the apicomplexan. Cryptosporidium parvum. Science. 2004;304:441–445. - PubMed
    1. Ajioka JW, Boothroyd JC, Brunk BP, Heh A, Hillier L, Manger ID, Marra M, Overton GC, Roos DS, Wan KL, et al. Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 1998;8:18–28. - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. UniProt: The Universal Protein Knowledgebase. Nucleic Acids Res. 2004;32:D115–D119. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. - PMC - PubMed

Publication types

Substances

LinkOut - more resources