Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 28;22(Suppl 3):700.
doi: 10.1186/s12864-021-07657-4.

A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences

Affiliations

A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences

Stephen Among James et al. BMC Genomics. .

Abstract

Background: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications.

Results: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions.

Conclusion: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin.

Keywords: Bioinformatics; Cross-reactivity; Crossreactome; Dengue virus; Flaviviridae; Flavivirus; Hepacivirus; Hepatitis C virus; Host-pathogen; Large-scale; Methodology; Pegivirus; Peptide overlap; Peptide sharing; Pestivirus; Share-ome; Shared sequences; West Nile virus; and Molecular mimicry..

PubMed Disclaimer

Conflict of interest statement

We the authors declare that we have no competing interests in the research.

Figures

Fig. 1
Fig. 1
A schematic workflow for large-scale identification and characterization of host-pathogen shared sequences
Fig. 2
Fig. 2
Dot matrix of Flaviviridae-human shared sequences at window length of three amino acid residues. Multiple direct repeat regions (cyan areas) were identified in the dot plot. Well-defined regions of low-complexity are outlined in black, while well distinct inverted repeat regions are outlined in dark-red with prominent black dots as the indirect repeats
Fig. 3
Fig. 3
Major Flaviviridae species that shared peptides of length nine (100% identical) with human proteins
Fig. 4
Fig. 4
Cellular localization of the human proteins that contained the Flaviviridae-human shared sequences
Fig. 5
Fig. 5
Flaviviridae-human share-ome interaction network of 2001 nodes and 215,897 edges. The top 20 human, hub genes with node degree of 300 and above for the Flaviviridae-human share-ome interaction network are shown in Table 5
Fig. 6
Fig. 6
Hepatitis C virus (HCV) genotype 1a-human protein-protein interaction (PPI) network. The HCV proteins associated with major hub, human proteins (TP53, PSMB7, and PSMB8, among others; Table 6). Orange nodes denote viral proteins with red edges linking to other nodes; blue nodes denote human proteins with grey edges linking to various nodes; and the yellow node denotes the hub protein with the highest degree of nodes. The TP53 is a hub connecting major nodes of the HCV genotype 1a-human PPI network
Fig. 7
Fig. 7
Dengue virus type (DV) 2 (strain Jamaica/1409/1983)-human protein-protein interaction (PPI) network. The DV proteins associated with major hub, human proteins (PTBP1, ACTC1 and ACTA2, among others; Table 7). Cyan nodes denote viral proteins with red edges linking to other nodes; blue nodes denote human proteins with grey edges linking to various nodes; and the yellow node denotes the hub protein with the highest degree of nodes. The PTBP1 is a hub connecting major nodes, including NS2A-alpha, which connects other viral and human nodes to PTBP1 and to other nodes of the DV-human PPI network

References

    1. Tagini F, Greub G. Bacterial genome sequencing in clinical microbiology: a pathogen-oriented review. Eur J Clin Microbiol Infect Dis. 2017;36(11):2007–2020. doi: 10.1007/s10096-017-3024-6. - DOI - PMC - PubMed
    1. Warrenfeltz S, Basenko EY, Crouch K, Harb OS, Kissinger JC, Roos DS, et al. EuPathDB: the eukaryotic pathogen genomics database resource. Methods Mol Biol. 2018;1757:69–113. doi: 10.1007/978-1-4939-7737-6_5. - DOI - PMC - PubMed
    1. Van Goethem N, Descamps T, Devleesschauwer B, Roosens NHC, Boon NAM, Van Oyen H, Robert A. Status and potential of bacterial genomics for public health practice: a scoping review. Implement Sci. 2019;14(1):79. doi: 10.1186/s13012-019-0930-2. - DOI - PMC - PubMed
    1. Pickett BE, Sadat EL, Zhang Y, Noronha JM, Squires RB, Hunt V, Liu M, Kumar S, Zaremba S, Gu Z, Zhou L, Larson CN, Dietrich J, Klem EB, Scheuermann RH. ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res. 2012;40(D1):593–598. doi: 10.1093/nar/gkr859. - DOI - PMC - PubMed
    1. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Chall. 2017;1(1):33–46. doi: 10.1002/gch2.1018. - DOI - PMC - PubMed