Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Feb;11(2):217-28.
doi: 10.1128/EC.05225-11. Epub 2011 Dec 2.

A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis

Affiliations

A machine learning approach to identify hydrogenosomal proteins in Trichomonas vaginalis

David Burstein et al. Eukaryot Cell. 2012 Feb.

Abstract

The protozoan parasite Trichomonas vaginalis is the causative agent of trichomoniasis, the most widespread nonviral sexually transmitted disease in humans. It possesses hydrogenosomes-anaerobic mitochondria that generate H(2), CO(2), and acetate from pyruvate while converting ADP to ATP via substrate-level phosphorylation. T. vaginalis hydrogenosomes lack a genome and translation machinery; hence, they import all their proteins from the cytosol. To date, however, only 30 imported proteins have been shown to localize to the organelle. A total of 226 nuclear-encoded proteins inferred from the genome sequence harbor a characteristic short N-terminal presequence, reminiscent of mitochondrial targeting peptides, which is thought to mediate hydrogenosomal targeting. Recent studies suggest, however, that the presequences might be less important than previously thought. We sought to identify new hydrogenosomal proteins within the 59,672 annotated open reading frames (ORFs) of T. vaginalis, independent of the N-terminal targeting signal, using a machine learning approach. Our training set included 57 gene and protein features determined for all 30 known hydrogenosomal proteins and 576 nonhydrogenosomal proteins. Several classifiers were trained on this set to yield an import score for all proteins encoded by T. vaginalis ORFs, predicting the likelihood of hydrogenosomal localization. The machine learning results were tested through immunofluorescence assay and immunodetection in isolated cell fractions of 14 protein predictions using hemagglutinin constructs expressed under the homologous SCSα promoter in transiently transformed T. vaginalis cells. Localization of 6 of the 10 top predicted hydrogenosome-localized proteins was confirmed, and two of these were found to lack an obvious N-terminal targeting signal.

PubMed Disclaimer

Figures

Fig 1
Fig 1
The machine learning procedure. For a learning set comprising all proteins known to be targeted to the hydrogenosome (positive set) and a set of nontargeted proteins (negative set), 57 different features were calculated. These values are passed to several classifiers, which aim to identify feature combinations that best differentiate between the positive and negative sets. In order to choose the best-performing classifier, 10-fold cross validation is performed. Within each fold, an inner cross validation is done to choose the best-performing features (feature selection). After the best classifier has been chosen, it is trained again over all of the learning set and is used to perform the prediction for each ORF in the T. vaginalis genome. The localization of the top-scoring predictions is experimentally tested. Newly identified hydrogenosomal proteins are added to the positive set, and another phase of learning can be performed.
Fig 2
Fig 2
A comparison of feature stability score using the MOT+ and MOT− schemes. Using the 10-fold cross-validation approach, the estimation of the classifier performance is repeated 10 times (10 folds; see Materials and Methods for details). In each repeat, a different set of best features may be selected. Feature stability measures the fraction of the cross-validation repeats in which the feature was selected. A feature that was selected repeatedly in all of the 10 folds will receive a score of 1, indicating that the feature was found to be consistently informative for the distinction between positive and negative sets. BBH, best BLAST hits; AA, amino acid.
Fig 3
Fig 3
Results of the in vivo localization of two novel hydrogenosomal proteins: TVAG_456770 (a paralog of the iron sulfur biosynthesis protein IscA), TVAG_479680 (2-nitropropane dioxygenase), and, as a negative control, TVAG_023840 (glucokinase), together with the hydrogenosomal marker ASCT (TVAG_ 395550). α, anti.
Fig 4
Fig 4
Localization of the mannosyl-transferase encoded by the TVAG_365830 gene. This mannosyl-transferase homologue possesses the same N-terminal sequence (MLRN) as found in PFO, but while PFO is imported into hydrogenosomes (Hyd) and the presequence is cleaved (36), TVAG_365830 is localized to the ER, despite possessing the same N terminus as pyruvate:ferredoxin oxidoreductase. (A) HA-tagged TVAG_365830. (B) DAPI staining. (C) Merge of the images in panels A and B. (D) Bright-field image. (E) An illustration of the typical arrangement of the ER (arrows) around the nucleus (Nuc) in a transmission electron microscopic image of T. vaginalis. When not attached to host tissue, flagellated T. vaginalis cells are pyriform and about 20 μm in length. A single cell can house several dozen hydrogenosomes, which are often found clustered in proximity to the axostyle (not visible in this section). Other membrane-bound structures include lysosomes (Lys) and vacuolar compartments (V).
Fig 5
Fig 5
A multiple sequence alignment and phylogenetic network of TVAG_479680, a novel hydrogenosomal protein (annotated as 2-nitropropane dioxygenase), with its homologs.
Fig 6
Fig 6
A multiple sequence alignment and phylogenetic network of TVAG_221830, a novel hydrogenosomal protein (containing a Glo-EDI-BRP-like domain), with its homologs.

Similar articles

Cited by

References

    1. Akhmanova A, et al. 1998. A hydrogenosome with a genome. Nature 396:527–528 - PubMed
    1. Alsmark UC, Sicheritz-Ponten T, Foster PG, Hirt RP, Embley TM. 2009. Horizontal gene transfer in eukaryotic parasites: a case study of Entamoeba histolytica and Trichomonas vaginalis. Methods Mol. Biol. 532:489–500 - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410 - PubMed
    1. Ashburner M, et al. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:25–29 - PMC - PubMed
    1. Aurrecoechea C, et al. 2009. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res. 37:D526–530 - PMC - PubMed

Publication types

Substances