Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 14;22(1):663.
doi: 10.1186/s12864-021-07982-8.

PathFams: statistical detection of pathogen-associated protein domains

Affiliations

PathFams: statistical detection of pathogen-associated protein domains

Briallen Lobb et al. BMC Genomics. .

Abstract

Background: A substantial fraction of genes identified within bacterial genomes encode proteins of unknown function. Identifying which of these proteins represent potential virulence factors, and mapping their key virulence determinants, is a challenging but important goal.

Results: To facilitate virulence factor discovery, we performed a comprehensive analysis of 17,929 protein domain families within the Pfam database, and scored them based on their overrepresentation in pathogenic versus non-pathogenic species, taxonomic distribution, relative abundance in metagenomic datasets, and other factors.

Conclusions: We identify pathogen-associated domain families, candidate virulence factors in the human gut, and eukaryotic-like mimicry domains with likely roles in virulence. Furthermore, we provide an interactive database called PathFams to allow users to explore pathogen-associated domains as well as identify pathogen-associated domains and domain architectures in user-uploaded sequences of interest. PathFams is freely available at https://pathfams.uwaterloo.ca .

Keywords: Environmental association; Hypothetical proteins; Lineage specificity; Pathogens; Proteins of unknown function; Virulence factors.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Scatterplots of Pfam domain pathogen-association. a Pfam domain presence in pathogen versus non-pathogen proteomes, with significant pathogen-associated patterns shown. Only domains present in > = 5 pathogens were included. b Trends in pathogenesis GO term annotation shown with respect to enrichment in pathogen proteomes and a measure of lineage specificity, the F1 score. The horizontal dotted line is at log10(0.05), showing the pathogen-association threshold. The vertical dotted line is at an F1 score of 30
Fig. 2
Fig. 2
Detected Pfam families with strong environmental associations. a Abundance heatmap of Pfam families with significant environmental-specificity scores (padj < 1 × 10− 15). The adjusted family size was calculated as the logarithm of the normalized adjusted family size (base 10), scaled across the domain values. The red lines on the right-side of the plot denote DUF rows. b Selected DUF families with strong environment-specificity scores. Plotted are the per-sample distributions of normalized adjusted family size in three environments: human gut, marine, and soil
Fig. 3
Fig. 3
Screenshot of the domain info page from PathFams for the LcrG Pfam family (PF07216)
Fig. 4
Fig. 4
Detection of pathogen-associated domains and domain architectures for four example proteins by the online PathFams resource. Accession IDs are OTO22244.1 (Enterococcus faecium BoNT/En toxin), WP_034687872.1 (Chryseobacterium piperi Cp1 toxin), BAB87738.1 (Clostridium haemolyticum flagellinolysin), and OKB66574.1 (Serratia marascens hypothetical protein). Sensitive mode with an E-value cut-off of 1 × 10− 7 was used for all sequences except the C. piperi Cp1 toxin. For the C. piperi sequence, an E-value cut-off of 1 × 10− 3 was required to visualize the more divergent ricin domains

References

    1. Doxey AC, Mansfield MJ, Lobb B. Exploring the evolution of virulence factors through bioinformatic data mining. mSystems. 2019;4:e00162-19. doi: 10.1128/mSystems.00162-19. - DOI - PMC - PubMed
    1. Liu B, Zheng DD, Jin Q, Chen LH, Yang J. VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic Acids Res. 2019;47:D687-92. - PMC - PubMed
    1. Negi SS, Schein CH, Ladics GS, Mirsky H, Chang P, Rascle JB, et al. Functional classification of protein toxins as a basis for bioinformatic screening. Sci Rep. 2017;7:1–11. doi: 10.1038/s41598-016-0028-x. - DOI - PMC - PubMed
    1. Thornton JM, Orengo CA, Todd AE, Pearl FMG. Protein folds, functions and evolution. J Mol Biol. 1999;293:333–42. doi: 10.1006/jmbi.1999.3054. - DOI - PubMed
    1. Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, et al. CATH: Comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015;43:D376–81. doi: 10.1093/nar/gku947. - DOI - PMC - PubMed

Substances

LinkOut - more resources