Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 1;10(10):2777-2784.
doi: 10.1093/gbe/evy209.

MultiTwin: A Software Suite to Analyze Evolution at Multiple Levels of Organization Using Multipartite Graphs

Affiliations

MultiTwin: A Software Suite to Analyze Evolution at Multiple Levels of Organization Using Multipartite Graphs

Eduardo Corel et al. Genome Biol Evol. .

Abstract

The inclusion of introgressive processes in evolutionary studies induces a less constrained view of evolution. Network-based methods (like large-scale similarity networks) allow to include in comparative genomics all extrachromosomic carriers (like viruses, the most abundant biological entities on the planet) with their cellular hosts. The integration of several levels of biological organization (genes, genomes, communities, environments) enables more comprehensive analyses of gene sharing and improved sequence-based classifications. However, the algorithmic tools for the analysis of such networks are usually restricted to people with high programming skills. We present an integrated suite of software tools named MultiTwin, aimed at the construction, structuring, and analysis of multipartite graphs for evolutionary biology. Typically, this kind of graph is useful for the comparative analysis of the gene content of genomes in microbial communities from the environment and for exploring patterns of gene sharing, for example between distantly related cellular genomes, pangenomes, or between cellular genomes and their mobile genetic elements. We illustrate the use of this tool with an application of the bipartite approach (using gene family-genome graphs) for the analysis of pathogenicity traits in prokaryotes.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
—Outline of the bipartite graph generation and analysis. At the root level, the bipartite graph only consists in disjoint star graphs. Level 1 and level 2 are constructed by two successive runs of factorgraph.py using the maps described in blue. The first factoring is based is the gene family clustering produced by our script familydetector. Different similarity thresholds can be used, resulting in differently structured graph (assuming a molecular clock, these graphs can be seen as time slices of evolution). The second factoring corresponds to the identification of twins by detect_twins.py. The change of identifiers in the graph is recorded in the trail files as indicated on the bottom line. At level 3, the operation is a terminal one, since it produces overlapping clusters. The analysis of the resulting components is performed by the description.py script, and is based on the annotations (at the root level) and the specified trail files.
F<sc>ig</sc>. 2.
Fig. 2.
—Overall structure of the bitwin.py program. The green boxes denote the user input files (optional when connected with a dotted line). The oval boxes represent the programs called by the bitwin.py script, and the in and outgoing arrows represent the input and output files. The blue boxes represented the output files generated overall by the bitwin.py script.
F<sc>ig</sc>. 3.
Fig. 3.
—Twin nodes in a toy example of tripartite graph. Twin classes are formed by all the nodes having exactly the same neighborhood. In this example, we highlighted in the same color the nodes forming the graph’s three twin classes containing more than one node. All nodes in black have a different set of neighbors (and form thus each their own twin class). In a multipartite graph, twins can be homogeneous, like twin 1 (in yellow) or heterogeneous, like twins 2 and 3. The detect_twins.py script implements an option to detect only homogeneous twins (possibly even of a given type). In a tripartite graph where nodes of respective types 1, 2, and 3 are gene families, genomes, and environments, it may be interesting to detect patterns like twin 2, where a gene family is found in the strict subset of those genomes that thrive in the same environment. Twin 3 is likely less informative, since the environment is nondiscriminating (core genes are nevertheless detected on the lower layer).
F<sc>ig</sc>. 4.
Fig. 4.
—Summary of the bipartite graph analysis of forty prokaryotic genomes. (A) The majority of gene families contained an equal proportion of pathogen and nonpathogen genes. Comparatively few are enriched in either pathogens or nonpathogens, with an extreme drop off from the peak at 0.5. A subset of gene families are exclusive to pathogens or to nonpathogens, indicated by peaks at 0 and 1, however the majority of these are only found in one genome. (B) Most groups of exclusively shared gene families also contain an equal proportion of pathogens and nonpathogens, however the peak at 0.5 is less extreme in comparison to the surrounding distribution. There is a more gradual decline in number of exclusively shared gene families from this peak toward the extremities at 0 and 1 than in the distribution at the gene family level. (C) Functional analysis revealed that the group of exclusively shared gene families containing all “core” gene families was predominantly composed of gene families involved in information and storage processing. This contrasts the groups of exclusively shared gene families containing gene families found in only two species, where informational genes are the least represented COG. Gene families found in two species are predominantly either associated with poorly characterized COGs or unannotated. (D) An example of group of exclusively shared gene families of four gene families (bottom nodes) codistributing in two relatively distantly related pathogen genomes (top nodes) from Dickeya zeae (Gamma-proteobacteria) and Capnocytophaga gingivalis (Flavobacteria). Two gene families (purple) contain components of the type IV secretion system, while two (yellow) have no known COG annotations. Their codistribution with components of the type IV secretion system in distantly related taxa suggests that these may play a role in pathogenicity.

References

    1. Ahn Y-Y, Ahnert SE, Bagrow JP, Barabási A-L.. 2011. Flavor network and the principles of food pairing. Sci Rep. 1:196. - PMC - PubMed
    1. Alaimo S, Giugno R, Pulvirenti A.. 2014. ncPred: ncRNA-disease association prediction through tripartite network-based inference. Front Bioeng Biotechnol. 2(71). - PMC - PubMed
    1. Bittner L, et al. 2010. Some considerations for analyzing biodiversity using integrative metagenomics and gene networks. Biol Direct 5(1):47.. - PMC - PubMed
    1. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E.. 2008. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008(10):P10008–P10008.
    1. Carvalho FM, Souza RC, Barcellos FG, Hungria M, Vasconcelos ATR.. 2010. Genomic and evolutionary comparisons of diazotrophic and pathogenic bacteria of the order Rhizobiales. BMC Microbiol. 10(1):37.. - PMC - PubMed

Publication types