Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 19:11:e75419.
doi: 10.7554/eLife.75419.

Key features of the genetic architecture and evolution of host-microbe interactions revealed by high-resolution genetic mapping of the mucosa-associated gut microbiome in hybrid mice

Affiliations

Key features of the genetic architecture and evolution of host-microbe interactions revealed by high-resolution genetic mapping of the mucosa-associated gut microbiome in hybrid mice

Shauni Doms et al. Elife. .

Abstract

Determining the forces that shape diversity in host-associated bacterial communities is critical to understanding the evolution and maintenance of metaorganisms. To gain deeper understanding of the role of host genetics in shaping gut microbial traits, we employed a powerful genetic mapping approach using inbred lines derived from the hybrid zone of two incipient house mouse species. Furthermore, we uniquely performed our analysis on microbial traits measured at the gut mucosal interface, which is in more direct contact with host cells and the immune system. Several mucosa-associated bacterial taxa have high heritability estimates, and interestingly, 16S rRNA transcript-based heritability estimates are positively correlated with cospeciation rate estimates. Genome-wide association mapping identifies 428 loci influencing 120 taxa, with narrow genomic intervals pinpointing promising candidate genes and pathways. Importantly, we identified an enrichment of candidate genes associated with several human diseases, including inflammatory bowel disease, and functional categories including innate immunity and G-protein-coupled receptors. These results highlight key features of the genetic architecture of mammalian host-microbe interactions and how they diverge as new species form.

Keywords: GWAS; codiversification; cospeciation; evolutionary biology; genetics; genomics; hybridization; microbiome; mouse; phylosymbiosis.

Plain language summary

The digestive system, particularly the large intestine, hosts many types of bacteria which together form the gut microbiome. The exact makeup of different bacterial species is specific to an individual, but microbiomes are often more similar between related individuals, and more generally, across related species. Whether this is because individuals share similar environments or similar genetic backgrounds remains unclear. These two factors can be disentangled by breeding different animal lineages – which have different genetic backgrounds while belonging to the same species – and then raising the progeny in the same environment. To investigate this question, Doms et al. studied the genes and microbiomes of mice resulting from breeding strains from multiple locations in a natural hybrid zone between different subspecies. The experiments showed that 428 genetic regions affected the makeup of the microbiome, many of which were known to be associated with human diseases. Further analysis revealed 79 genes that were particularly interesting, as they were involved in recognition and communication with bacteria. These results show how the influence of the host genome on microbiome composition becomes more specialized as animals evolve. Overall, the work by Doms et al. helps to pinpoint the genes that impact the microbiome; this knowledge could be helpful to examine how these interactions contribute to the emergence of conditions such as diabetes or inflammatory bowel disease, which are linked to perturbations in gut bacteria.

PubMed Disclaimer

Conflict of interest statement

SD, HF, MR, CC, AK, SI, AF, LT, JB No competing interests declared

Figures

Figure 1.
Figure 1.. Heritability estimates and their relationship to cospeciation rates.
(A–B) Lollipop chart showing the values of the chip heritability (dark red circles), narrow-sense heritability (green squares), PVE by the additive effect (blue triangles), and additive and dominance effect (yellow diamonds) of all significant single nucleotide polymorphisms (SNPs) within one taxon for DNA-based traits (A) and RNA-based traits (B) Only taxa with a non-zero narrow-sense heritability estimate are shown. Only the outline of non-significant heritability estimates (p<0.05, Restricted Likelihood Ratio test) are shown. Taxa without significant hits have no PVE reported. (C–D) Relationship between the narrow-sense heritability estimates for the relative abundance of bacterial taxa and cospeciation rate for the same genus calculated by Groussin et al., 2017. DNA level (C), and RNA level (D). The blue line represents a linear regression fit to the data and the grey area the corresponding confidence interval. p-Values and the correlation coefficient are calculated using the Spearman’s correlation test. The text labels on the y-axis (A–B) and the text labels in panels (C-D) are coloured according to taxonomic level: amplicon sequence variants (ASV) in black, genus in purple, family in light blue, order in red, class in green, and phylum in yellow.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Selection of taxa for mGWAS analysis.
A scatter plot showing the association of average relative abundance of taxa with their prevalence in the G2 mapping population. Taxa retained for analysis are coloured according to the originating core. The size of each dot represents the number of individuals that have a median abundance higher than 0.2% of the taxon. The dashed lines represent the thresholds of the core vertical: median abundance >0.2% and horizontal prevalence of 25%.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Relative abundances of core taxa.
Core orders (A) and genera (B) present in the G2 mapping population. Each vertical line represents one individual. Sub-cross (see Figure 1—figure supplement 6) is indicated at the top.
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Correlation of single nucleotide polymorphisms (SNP)-based heritability estimates based on DNA (x-axis) or RNA (y-axis).
The blue line represents a linear regression fit to the data. Red dashed line represents the identity line with a slope of 1.
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. Relationship between the chip heritability estimates for the relative abundance of bacterial taxa and cospeciation rate for the same genus calculated by Groussin et al., 2017.
DNA level (A) and RNA level (B). The blue line represents a linear regression fit to the data and the grey area the corresponding confidence interval. p-Values and the correlation coefficient are calculated using the Spearman’s correlation test. The text labels in panels are coloured according to taxonomic level: amplicon sequence variants (ASV) in black, genus in purple, family in light blue, order in red, class in green, and phylum in yellow.
Figure 1—figure supplement 5.
Figure 1—figure supplement 5.. Correlation between heritability estimates.
Estimates calculated with lme4QTL and heritability estimates reported in Org et al., 2015 (males) (A) and heritability estimates reported in Turpin et al., 2016 (B). Only taxa present in both studies are shown (n=11 and n=28, respectively). The blue line represents a linear regression fit to the data and the grey area the corresponding confidence interval. p-Values and the correlation coefficient are calculated using the Spearman’s correlation test.
Figure 1—figure supplement 6.
Figure 1—figure supplement 6.. Overview of the intercross design.
G0 mice are from eight partially inbred lines derived from mice wild-caught in four hybrid zone sites. Hybrid index—the percentage of musculus alleles—is reported as the mean for the G0 mice from each line (top), or mean of 40 G2s from each sub-cross (bottom). We performed eight G1 crosses with one line with hybrid index ~50% (purple shades) and one line with hybrid index >50% (green shades); colour on the left side of mouse diagram indicates dam line and right side indicates sire line. Next, G1 mice were crossed in eight combinations such that each G2 mouse had one grandparent from each of the four breeding stocks, indicated by colours of mouse diagram, and representative chromosomes below. Tail colour indicates Y chromosome strain, and oval indicates mitochondrial strain.
Figure 2.
Figure 2.. Heatmap of significant host loci from association mapping of bacterial abundances.
Karotype plot showing the number of significant loci found using a study-wide threshold, where (A) plots the significance intervals and (B) the significant single nucleotide polymorphisms (SNP) markers on the chromosomes. The position of the SNPs on panel (B) has been amplified by 0.5 Mb to visualise it. The position of the genes closest to SNPs with the lowest p-values (Tlr4, and Irak4 and Adamsts20) are indicated.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Manhattan plots for ASV184 (Dorea).
The complete model (A), the additive effect (B) or the dominance effect (C) are shown. Single nucleotide polymorphisms (SNPs) passing the study-wide significance threshold (solid line) are shown in dark blue, while genome-wide significant SNPs (dashed line) are shown in light blue. In panel (A), the closest gene to the SNP is shown for a subset of significant SNPs.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Number of significantly associated loci per bacterial taxon.
Loci with significant additive effects (add.P), dominance effects (dom.P) or effects in full model (P) are indicated.
Figure 3.
Figure 3.. Genetic architecture of significant loci.
(A) Source of the allele with the highest phenotypic value. (B) Histogram of dominance values d/a of significant loci reveals a majority of loci acting recessive or partially recessive. (C) Histogram showing the percentage of variance explained (PVE) by the peak single nucleotide polymorphisms (SNP) for DNA (left) and RNA (right).
Figure 4.
Figure 4.. High confidence protein-protein interaction (PPI) network of genes closest to single nucleotide polymorphisms (SNPs) significantly associated with bacterial abundances.
Network clusters are annotated using STRING’s functional enrichment (Doncheva et al., 2019). Nodes represent proteins and edges their respective interactions. Only edges with an interaction score higher than 0.9 are retained. The width of the edge line expresses the interaction score calculated by STRING. The colour of the nodes describes the expression of the protein in the intestine where yellow is not expressed and purple is highly expressed.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Protein-protein interaction (PPI) network of hub genes of the 'nearest gene' network.
The top ten hub genes of the protein-protein interaction (PPI) network of the closest genes to the host single nucleotide polymorphisms (SNPs) significantly associated with bacterial abundances Figure 4. The nodes are coloured according to hub gene rank from 1 (red) to 10 (yellow). Blue nodes are the first neighbours.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Genes belonging to overrepresented KEGG pathways within the host genes closest to significant single nucleotide polymorphisms (SNPs) from association analysis.
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. Enriched KEGG pathways.
KEGG pathways enriched among closest genes to significant single nucleotide polymorphisms (SNPs) from association analysis. Node colour indicates false discovery rate (FDR)-adjusted p-value of enrichment and node size indicates number of candidate genes in pathway.
Figure 4—figure supplement 4.
Figure 4—figure supplement 4.. Enriched human diseases among genes closest to significant single nucleotide polymorphisms (SNPs) from association analysis.
Figure 5.
Figure 5.. Significant loci in this study previously found in quantitative trait loci (QTL) studies of the mouse gut microbiome.
The genes present in two repeatedly identified regions are depicted in boxes.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Protein-protein interaction (PPI) network of overlapping differentially expressed proteins according to colonization status.
STRING (Szklarczyk et al., 2019) protein-protein interaction (PPI) network of proteins that are differentially expressed in the intestine (small intestine and colon) of germ-free (GF) mice compared to conventionally raised mice, found in the present study. The colour of the network nodes indicates whether the quantitative trait loci (QTL) hit was found using the DNA abundances (green), RNA abundances (purple) or was found in both (orange). The shape represents if the gene of the protein was the closest gene to the significant single nucleotide polymorphisms (SNP) (rectangle), if the gene was also found in QTLs of other studies (octagon), a combination of both (diamond), or only differentially expressed in GF mice vs conventionally raised mice. The node size expresses the number of taxa where the gene was found in a QTL. The edges represent PPI, where the line thickness indicates the strength of the data support from text mining, experiments, databases, coexpression, gene-fusion, and co-occurrence.
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. Protein-protein interaction network of hub genes of 'differentially expressed according to colonization status'-network.
Visualisation of the top hub genes calculated with the MCC algorithm and their first neighbours from the protein-protein interaction (PPI) network of genes found in intervals in present study that are also differentially expressed in germ-free versus conventionally raised mice (Figure 5—figure supplement 1). Edges represent the protein-protein associations. The red nodes represent genes with a high degree ( = hub genes), and the yellow nodes with a low degree, while the blue nodes represent their first neighbours. All nodes shown are differentially expressed in GF mice. Hexagon shaped nodes are genes/proteins also found associated with gut microbiome abundances in other mouse quantitative trait loci (QTL) studies, and round nodes are ‘only’ differentially expressed in GF mice. The size of the node is an indication of the number of taxa associated with the gene.
Figure 6.
Figure 6.. Network of host candidate genes influencing bacterial traits using STRING (https://string-db.org).
The nodes represent proteins and are coloured according to a selection of enriched GO terms and pathways: G-protein coupled receptor (GPCR) signalling (red), regulation of the immune system process (blue), response to nutrient levels (light green), fatty acid metabolic process (pink), glucose homeostasis (purple), response to antibiotic (orange), regulation of feeding behaviour (yellow), positive regulation of insulin secretion (dark green), circadian entrainment (brown), and response to vitamin D (turquoise). The colour of the edges represents the interaction type: known interactions from curated databases (turquoise) or experimentally determined (pink); predicted interactions from gene neighbourhood (green), gene fusions (red), gene co-occurrence (blue); other interactions from text-mining (light green), coexpression (black), and protein homology (purple).
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Protein protein interaction (PPI) network of 304 filtered candidate genes.
This network is generated in STRING (Szklarczyk et al., 2019) and Cytoscape. Nodes are coloured according to the intestine expression score (source: STRING).
Author response image 1.
Author response image 1.. Visualization of the random effects: mating pair identifier (A) and mating pair identifier nested within the sub-cross ID (B).
Author response image 2.
Author response image 2.. Genomic inflation factor for different P values (additive effect, left; dominance effect, middle; total model, right) calculated by the different models: current model including the mating pair id nested in the sub-cross ID (top), current model (middle), and the current model including PC1-PC3 as fixed effects (bottom).
Dashed line and values indicate the average genomic inflation factor.
Author response image 3.
Author response image 3.. LD score regression intercept for P values calculated by the different models: current model including the mating pair id nested in the sub-cross ID (top), current model (middle), and the current model including PC1-PC3 as fixed effects (bottom).
Dashed line and values indicate the average genomic inflation factor.
Author response image 4.
Author response image 4.. Correlation of heritability estimates calculated using different methods/programs and models to the percentage of phenotypic variance explained by the additive genotypes of all significant SNPs within the trait.
For Gaston, Sommer, Gemma 1PC, and lme4QTL.1PC, a linear mixed model was used with the first PC of the genotypes as a fixed effect and a genomic relatedness matrix (GRM) as a random effect. For Gemma.3PC, the first three PCs of the genotype data were used as fixed effects and a GRM as a random effect. For lme4QTL.MP, the mating pair ID was used as a random effect together with the GRM, while in lme4QTL.fam the mating pair ID was nested in the sub-cross ID as random effects together with a GRM to estimate the additive heritability.
Author response image 5.
Author response image 5.. Heritability estimates calculated with lme4QTL compared to other methods.
The dashed line represents the identity line with a slope of 1. The colored lines are the corresponding linear regression lines.
Author response image 6.
Author response image 6.. Correlation of heritability estimates, calculated with GEMMA using the first principal component as a fixed effect, with co-speciation rates reported in Groussin et al., 2017.
Author response image 7.
Author response image 7.. Correlation of heritability estimates, calculated with GEMMA using the first three principal component as a fixed effect, with co-speciation rates reported in Groussin et al., 2017.
Author response image 8.
Author response image 8.. Correlation of heritability estimates, calculated with sommer using the first principal component as a fixed effect, with co-speciation rates reported in Groussin et al., 2017.

References

    1. Abdi H. The bonferonni and šidák corrections for multiple comparisons. Encyclopedia of Measurement and Statistics, SAGE. 2007;1:e9
    1. Alhasson F, Das S, Seth R, Dattaroy D, Chandrashekaran V, Ryan CN, Chan LS, Testerman T, Burch J, Hofseth LJ, Horner R, Nagarkatti M, Nagarkatti P, Lasley SM, Chatterjee S. Altered gut microbiome in a mouse model of Gulf War Illness causes neuroinflammation and intestinal injury via leaky gut and TLR4 activation. PLOS ONE. 2017;12:e0172914. doi: 10.1371/journal.pone.0172914. - DOI - PMC - PubMed
    1. Amato KR, G Sanders J, Song SJ, Nute M, Metcalf JL, Thompson LR, Morton JT, Amir A, J McKenzie V, Humphrey G, Gogul G, Gaffney J, L Baden A, A O Britton G, P Cuozzo F, Di Fiore A, J Dominy N, L Goldberg T, Gomez A, Kowalewski MM, J Lewis R, Link A, L Sauther M, Tecot S, A White B, E Nelson K, M Stumpf R, Knight R, R Leigh S. Evolutionary trends in host physiology outweigh dietary niche in structuring primate gut microbiomes. The ISME Journal. 2019;13:576–587. doi: 10.1038/s41396-018-0175-0. - DOI - PMC - PubMed
    1. Bäckhed F, Ding H, Wang T, Hooper LV, Koh GY, Nagy A, Semenkovich CF, Gordon JI. The gut microbiota as an environmental factor that regulates fat storage. PNAS. 2004;101:15718–15723. doi: 10.1073/pnas.0407076101. - DOI - PMC - PubMed
    1. Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:2. doi: 10.1186/1471-2105-4-2. - DOI - PMC - PubMed

Publication types

Substances

Associated data