Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 27;1(6):e00101-16.
doi: 10.1128/mSystems.00101-16. eCollection 2016 Nov-Dec.

From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer

Affiliations

From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer

Aaron Weimann et al. mSystems. .

Abstract

The number of sequenced genomes is growing exponentially, profoundly shifting the bottleneck from data generation to genome interpretation. Traits are often used to characterize and distinguish bacteria and are likely a driving factor in microbial community composition, yet little is known about the traits of most microbes. We describe Traitar, the microbial trait analyzer, which is a fully automated software package for deriving phenotypes from a genome sequence. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. Furthermore, it suggests protein families associated with the presence of particular phenotypes. Our method uses L1-regularized L2-loss support vector machines for phenotype assignments based on phyletic patterns of protein families and their evolutionary histories across a diverse set of microbial species. We demonstrate reliable phenotype assignment for Traitar to bacterial genomes from 572 species of eight phyla, also based on incomplete single-cell genomes and simulated draft genomes. We also showcase its application in metagenomics by verifying and complementing a manual metabolic reconstruction of two novel Clostridiales species based on draft genomes recovered from commercial biogas reactors. Traitar is available at https://github.com/hzi-bifo/traitar. IMPORTANCE Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.

Keywords: ancestral trait reconstruction; genotype-phenotype inference; metagenomics; microbial traits; phenotypes; phyletic patterns; single-cell genomics; support vector machines.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Traitar can be used to phenotype microbial community members on the basis of genomes recovered from single-cell sequencing or (metagenomic) environmental shotgun sequencing data or of microbial isolates. Traitar provides classification models based on protein family annotation for a wide variety of different phenotypes related to the use of various substrates as source of carbon and energy for growth, oxygen requirement, morphology, antibiotic susceptibility, and enzymatic activity.
FIG 2
FIG 2
Work flow of Traitar. Input to the software can be genome sequence samples in nucleotide or amino acid FASTA format. Traitar predicts phenotypes on the basis of precomputed classification models and provides graphic and tabular output. In the case of nucleotide sequence input, the protein families that are important for the phenotype predictions will be further mapped to the predicted protein-coding genes.
FIG 3
FIG 3
Macroaccuracy for each phenotype of the Traitar phypat and phypat+PGL phenotype classifiers determined in nested cross-validation of 234 bacterial species described in GIDEON (see evaluation metrics in Materials and Methods; Table 1; see Table S1 in the supplemental material).
FIG 4
FIG 4
Classification accuracy for each taxon at different ranks of the NCBI taxonomy. For better visualization of names for the internal nodes, the taxon names are displayed on branches leading to the respective taxon node in the tree. The nested cross-validation accuracy obtained with Traitar for 234 bacterial species described in GIDEON was projected onto the NCBI taxonomy down to the family level. Colored circles at the tree nodes depict the performance of the phypat+PGL classifier (left-hand circles) and the phypat classifier (right-hand circles). The size of the circles reflects the number of species per taxon.
FIG 5
FIG 5
Single-cell phenotyping with Traitar. We used 20 genome assemblies with various degrees of completeness from single cells of the “Candidatus Cloacimonetes” phylum and a joint assembly for phenotyping with Traitar. Shown is a heat map of assembly samples versus phenotypes, which is the standard visualization for phenotype predictions in Traitar. The origin of the phenotype’s prediction (Traitar phypat and/or phypat+PGL classifier) determines the color of the heat map entries. The sample labels have their genome completeness estimates as suffixes. The colors of the dendrogram indicate similar phenotype distributions across samples, as determined by a hierarchical clustering with SciPy (http://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html).
FIG 6
FIG 6
Phenotyping of simulated draft genomes and single-cell genomes. In panel a, we used 20 genome assemblies with various degrees of completeness from single cells of the “Candidatus Cloacimonetes” phylum and a joint assembly for phenotyping with the Traitar phypat and phypat+PGL classifiers. Shown is the performance of the phenotype prediction versus the genome completeness of the single cells with respect to the joint assembly. In panel b, we simulated draft genomes on the basis of an independent test set of 42 microbial (pan)genomes. The coding sequences of these genomes were downsampled (10 replications per sampling point), and the resulting simulated draft genomes were used for phenotyping with the Traitar phypat and phypat+PGL classifiers. We plotted various performance estimates (mean center values and standard deviation error bars are shown) against protein content completeness.
FIG 7
FIG 7
Phenotype gain and loss dynamics match protein family dynamics. Shown are the phenotype-protein family gain and loss dynamics for families identified as important by Traitar for the L-arabinose phenotype. Signed colored circles along the tree branches depict protein family gains (+) or losses (−). Taxon nodes are colored according to their inferred (ancestral) phenotype state.

References

    1. Goodfellow M, Kämpfer P, Busse H-J, Trujillo ME, Suzuki K-i, Ludwig W, Whitman WB. 2012. Bergey’s manual of systematic bacteriology. Springer, New York, NY.
    1. Martiny JB, Jones SE, Lennon JT, Martiny AC. 2015. Microbiomes in light of traits: a phylogenetic perspective. Science 350:aac9323. doi: 10.1126/science.aac9323. - DOI - PubMed
    1. Bai Y, Müller DB, Srinivas G, Garrido-Oter R, Potthoff E, Rott M, Dombrowski N, Münch PC, Spaepen S, Remus-Emsermann M, Hüttel B, McHardy AC, Vorholt JA, Schulze-Lefert P. 2015. Functional overlap of the Arabidopsis leaf and root microbiota. Nature 528:364–369. doi: 10.1038/nature16192. - DOI - PubMed
    1. Narihiro T, Sekiguchi Y. 2007. Microbial communities in anaerobic digestion processes for waste and wastewater treatment: a microbiological update. Curr Opin Biotechnol 18:273–278. doi: 10.1016/j.copbio.2007.04.003. - DOI - PubMed
    1. Olapade OA, Ronk AJ. 2015. Isolation, characterization and community diversity of indigenous putative toluene-degrading bacterial populations with catechol-2,3-dioxygenase genes in contaminated soils. Microb Ecol 69:59–65. doi: 10.1007/s00248-014-0466-6. - DOI - PubMed