Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Jul 20;22(4):bbaa290.
doi: 10.1093/bib/bbaa290.

NetCoMi: network construction and comparison for microbiome data in R

Affiliations
Comparative Study

NetCoMi: network construction and comparison for microbiome data in R

Stefanie Peschel et al. Brief Bioinform. .

Abstract

Motivation: Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data.

Results: Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich).

Availability: R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi.

Contact: Tel:+49 89 3187 43258; stefanie.peschel@mail.de.

Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Keywords: compositional data; differential association; microbial association estimation; network analysis; network comparison; sample similarity network.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The proposed workflow for constructing, analysing and comparing microbial association networks, implemented in the R package NetCoMi. The main framework (displayed as continuous lines) requires a formula image read count matrix as input. The data preparation step includes sample and taxa filtering, zero replacement and normalization (step 1). Associations are calculated and stored in an adjacency matrix (step 2). Alternatively, an association matrix is accepted as input, from which the adjacency matrix is determined. A more detailed chart describing step 2 is given in Figure 2. In step 3, network metrics are calculated, which can be visualized in the network plot (step 4). If two networks are constructed (by passing a binary group vector, two count matrices or two user-defined association matrices to the function), their properties can be compared (step 5). Besides the main workflow, a differential network can be constructed from the association matrix.
Figure 2
Figure 2
Approaches for network construction that are available in NetCoMi, depending on the association measure. For correlations, in addition to a threshold and statistical testing, the soft-thresholding approach from WGCNA package [52] is implemented (marked by blue arrows). Dissimilarity based on topological overlap (also adopted from WGCNA package) is available as a further dissimilarity transformation approach in addition to metric distances and thus used for all network properties based on shortest paths. Whether a network measure is based on similarity or dissimilarity is stated in Table 4. Network construction without a sparsification step leads to dense networks where all nodes are connected.
Figure 3
Figure 3
Main NetCoMi functions. For each function, its purpose together with its main arguments is shown. The objects returned from the respective functions are colored in orange. The steps (colored in red) correspond to the steps of the overall workflow shown in Figure 1.
Figure 4
Figure 4
Bacterial associations for the combined data set with samples from Ulm and Munich. The SPRING method [55] is used as association measure. The estimated partial correlations are transformed into dissimilarities via the “signed” distance metric and the corresponding (non-negative) similarities are used as edge weights. Green edges correspond to positive estimated associations and red edges to negative ones. Eigenvector centrality is used for defining hubs (nodes with a centrality value above the empirical 95% quantile) and scaling node sizes. Hubs are highlighted by bold text and borders. Node colors represent clusters, which are determined using greedy modularity optimization. A: complete network for the data set with 100 taxa and 1022 samples. Unconnected nodes are removed. B: reduced network, where only the 50 nodes with the highest degree are shown. Centrality measures and clusters are adopted from the complete network.
Figure 5
Figure 5
Comparison of bacterial associations in the mattress dust between the study centers Munich and Ulm. The SPRING method [55] is used as association measure. The estimated partial correlations are transformed into dissimilarities via the “signed” distance metric and the corresponding similarities are used as edge weights. Eigenvector centrality is used for defining hubs and scaling node sizes. Node colors represent clusters, which are determined using greedy modularity optimization. Clusters have the same color in both networks if they share at least two taxa. Green edges correspond to positive estimated associations and red edges to negative ones. The layout computed for the Munich network is used in both networks. Nodes that are unconnected in both groups are removed. Taxa names are abbreviated (Table S9 for the original names).
Figure 6
Figure 6
Comparing dissimilarity networks based on Aitchison’s distance [96] (Supplementary Table S3) between mattress dust and nasal swabs for the same set of subjects (nodes). Only samples and taxa with at least 1000 reads, respectively, are included leading to formula image=707 genera in the Mattress group, formula image=184 genera in the Nose group, and formula image=980 samples in both groups. Counts are normalized to fractions and – since zeros must be replaced for the clr transformation – “multiplicative imputation” ( Table 3 in the main text) is used for zero handling. The dissimilarity matrix is scaled to [0,1] and sparsified using the k-nearest neighbor method (formula image=3 for both networks). Node colors represent clusters, identified using hierarchical clustering with average linkage. A cluster has the same color in both networks if they have at least 100 nodes in common (the minimum cluster size among both groups is 560). Hubs (highlighted by bold borders) are nodes with an eigenvector centrality larger than the 99% quantile of the empirical quantile of eigenvector centralities. Edge thickness corresponds to similarity values (calculated by formula image). Nodes are placed further together, the more similar their bacterial composition is. Whether a sample has been collected in Munich or Ulm is marked by node shapes. Unconnected nodes are removed.

References

    1. Janda JM, Abbott SL. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J Clin Microbiol 2007; 45:2761–4. - PMC - PubMed
    1. Huse SM, Dethlefsen L, Huber JA, et al. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet 2008; 4:e1000255. - PMC - PubMed
    1. Cho I, Blaser MJ. The human microbiome: at the interface of health and disease. Nat Rev Genet 2012; 13:260–70. - PMC - PubMed
    1. Davidson RM, Epperson LE. Microbiome Sequencing Methods for Studying Human Diseases. In: Methods in Molecular Biology, Vol. 1706. Humana Press Inc., 2018, 77–90. - PubMed
    1. Davis NM, Proctor DM, Holmes SP, et al.. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 2018; 6:226. - PMC - PubMed

Publication types