Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2026 Jan 23:rs.3.rs-7706316.
doi: 10.21203/rs.3.rs-7706316/v1.

Functional divergence of the gut microbiome associated with lifestyle and helminth infection in Indigenous Peninsular Malaysian

Affiliations

Functional divergence of the gut microbiome associated with lifestyle and helminth infection in Indigenous Peninsular Malaysian

Soo Ching Lee et al. Res Sq. .

Abstract

Gut microbiome catalogs from Indigenous Southeast Asian populations remain underrepresented. Here, we integrated metagenomic and metatranscriptomic data from Indigenous Orang Asli (OA) in Peninsular Malaysia and urban residents of Kuala Lumpur (KL), together with immune profiling, to investigate gut microbial activity and functions associated with lifestyle and helminth infection. Prevotella showed significantly higher transcriptional activity in OA, whereas Bacteroides was more active in KL, corresponding to distinct immune signatures. Microbial genome-wide association studies (mGWAS) revealed Prevotella copri_A variants were linked to lifestyle and host immunity, while Blautia strain variation was associated with helminth infection. Malaysian metagenome-assembled genomes (MAGs) uncovered 307 novel species, predominantly within Clostridia. Among these, the novel HGM13006 species were enriched with genes for starch and sucrose metabolism, and the novel Ruminococcus_D species in flagellar assembly and chemotaxis. Together, these findings provide function-level insights into gut microbiome variation associated with lifestyle and helminth infection in an indigenous population.

PubMed Disclaimer

Conflict of interest statement

Additional Declarations: There is NO Competing Interest.

Figures

Extended Data Fig. 1:
Extended Data Fig. 1:. Overview of computational workflow and data outputs.
Summary of the computational workflow, tools used, and primary data generated starting from genome assembly and binning, MAG quality assessment, integration between Malaysian, UHGG and KIJ databases, and taxonomic classification based on shotgun metagenomic (DNA) data.
Extended Data Fig. 2:
Extended Data Fig. 2:. Comparison of enrichment pathway, taxonomic and host immune profiles between groups.
Venn diagram of KEGG gene IDs comparing between Malaysian and known MAGs for (a) HGM13006 and (b) Ruminococcus_D. Bar plots displaying the relative abundance of the top taxa at the (c) class, (d) family and (e) order levels, comparing RNA and DNA data. Boxplots showing metagenomic abundance of (f, h) Prevotella and (g, i) Bacteroides between OA and KL groups and Bacteroides between helminth infection. (j) Radar plot displaying log-transformed median levels of 13 inflammatory cytokines and chemokines significantly different between helminth infected and uninfected individuals.
Extended Data Fig. 3:
Extended Data Fig. 3:
Differential activity and abundance of Prevotella and Bacteroides species between OA and KL populations. (a) Bar plots showing Prevotella and Bacteroides species with significantly different transcriptional activity (RNA; left) and genomic abundance (DNA; right) between OA and KL individuals. This were assessed using MaAsLin2, adjusting for age, gender, and helminth infection status. (b) Box plots displaying log-transformed abundance of selected top differentially active and abundant Prevotella and Bacteroides species between groups in both RNA (top) and DNA datasets (bottom).
Extended Data Fig. 4:
Extended Data Fig. 4:
Differential activity and abundance of Prevotella and Bacteroides species between helminth infected and uninfected OA populations. (a) Bar plots showing Prevotella and Bacteroides species with significantly different transcriptional activity (RNA; left) and genomic abundance (DNA; right) between OA and KL individuals. This were assessed using MaAsLin2, adjusting for age, gender, and helminth infection status. (b) Box plots displaying log-transformed abundance of selected top differentially active and abundant Prevotella and Bacteroides species between groups in both RNA (top) and DNA datasets (bottom).
Extended Data Fig. 5:
Extended Data Fig. 5:. Cytokines ranked by the number of microbial species with correlated strain-level variations.
Microbial species are categorized based on the direction of correlations of their genetic variants with cytokine levels: species with only variants showing positive correlations (+), species with only variants showing negative correlations (−), or species with variants showing both positive and negative correlations (+ & −)with certain cytokines. Cytokines highlighted in brown are those significantly correlated with Prevotella strain-level variants.
Fig. 1:
Fig. 1:. Gut metagenome assembled genomes (MAGs) specific to Malaysian.
(a) Overview of the collected samples from Malaysian population, encompassing stool shotgun metagenomic (DNA) and metatranscriptomic (RNA) sequencing across cross-sectional and longitudinal collections at 21 and 42 days post-deworming. Helminth infection status is shown as positive (without “X”) and negative (marked with “X”). Blood samples were obtained from the same participants, with subsets processed for clinical serum testing and Olink proteomics (cytokines and chemokines). (b) Summary of 5,355 dereplicated stool prokaryotic MAGs from Malaysian, Unified Human Gastrointestinal Genome (UHGG) and Korean, Indian, Japanese (KIJ) datasets. The Venn diagram demonstrates the shared and unique gut prokaryotic (bacterial & archaeal) MAGs. 307 (5.7%) specific to Malaysia, 572 (10.7%) are shared among the three databases, 673 (12.6%) specific to KIJ, and 2,869 (53.6%) specific to UHGG. (c) Bar plot displaying the taxonomic classification at the class level for the prevalent Malaysian bacterial MAGs. (d) Phylogenetic tree showing the Malaysian bacterial MAGs and known MAGs within class Clostridia, annotated using GTDB taxonomy.
Fig. 2:
Fig. 2:. Malaysian MAGs under class Clostridia and their enrichment pathways.
Phylogenetic tree of the Malaysian and known bacterial MAGs annotated with GTDB taxonomy within the family (a) Oscillospiraceae and (c) Ruminococcaceae. Dot plots show the Functional pathways enriched in Malaysian MAGs compared to known MAGs for (b) HGM13006 and (d) Ruminococcus_D. Dot color indicates the MAG origin (Malaysian: brown; known: light green), dot size corresponds to the number of KEGG orthologs annotated to each pathway and x- axis represents the statistical significance of pathway enrichment as −log10 (adjusted p-value).
Fig. 3:
Fig. 3:. Transcriptional activity of Prevotella & Bacteroides.
(a) Bar plots displaying the relative abundance of the top taxa at the phylum (left) and genus levels (right), comparing RNA and DNA data. Boxplots showing transcriptional activity of (b) Prevotella and (c) Bacteroides between OA and KL groups. Heatmaps of (d) transcriptional activity and (e) metagenomic abundance for Prevotella and Bacteroides, with samples in columns categorized by village. The first horizontal bar color indicates intestinal helminth infection status, while the second shows the Trichuris infection intensity (ranging from 12 to 119,875 eggs per gram, from light to heavy intensity). The boxplot illustrates the transcriptional activity of the genera (f) Prevotella, (g) Bacteroides, and (h & i) the Malaysian Prevotella species, comparing the differences between helminth infection statuses.
Fig. 4:
Fig. 4:. Association between the Prevotella/Bacteroides ratio with host immune responses.
(a) Horizontal barplots showing cytokines and chemokines significantly associated with the Prevotella/Bacteroides ratio, identified using a simple linear regression model based on 127 DNA and 68 RNA samples. The left panel displays 21 proteins from RNA samples and the right panel shows 29 proteins from DNA samples. Bars length represents the effect size as indicated by the adjusted R2. Representative scatter plots showing Spearman correlations between selected top proteins and log-transformed abundance (log[1+Abundance]) of (b) Prevotella and (c) Bacteroides. Correlation coefficients (R) and corresponding p-values are annotated. (d) Radar plot displaying the log-transformed median levels of 33 inflammatory cytokines and chemokines that were significantly different between the OA (n=66) and KL (n=30) cohorts.
Fig. 5:
Fig. 5:. Strain-level variations in the gut microbiome associated with Orang Asli and helminth infection.
(a) Number of genes harboring significant strain-level variants in the OA population compared to the KL cohort, grouped by taxonomic class (top) and genus (bottom). (b) KEGG modules enriched among genes with significant variants in representative species. Dot size indicates the number of genes per module; vertical lines denote statistical significance after correction for multiple testing. The top five enriched modules are shown for each species. (c) Number of genes with significant strain-level variants associated with helminth infection status, grouped by taxonomic class (top) and genus (bottom).
Fig. 6:
Fig. 6:. Strain-level variations in the gut microbiome correlates with host cytokine and chemokine levels.
(a) Microbial species ranked by the number of cytokines significantly correlated with their strain-level variants. Cytokines are grouped by correlation direction: exclusively positive (+), exclusively negative (−), or both positive and negative for at least one strain-level variant of the species (+ & −). (b) Overlap between cytokines correlated with Prevotella strain-level variants (“mGWAS”) and those correlated with the Prevotella:Bacteroides abundance ratio in metagenomic (“P/B ratio on DNA”) and metatranscriptomic (“P/B ratio on RNA”) data. (c) KEGG modules enriched among genes containing cytokine-correlated variants for representative species and cytokines. Dot size indicates the number of genes per module; vertical lines indicate statistical significance after correction for multiple testing. The top ten enriched modules per species are shown. (d) Example of a Prevotella copri_A variant in the prpC gene significantly correlated with IL-12B levels. Samples were grouped by genotype: only reference allele (“Ref”), only alternative allele (“Alt”), or both alleles (“Ref/Alt”). Q-value was calculated using Spearman correlation and adjusted for multiple testing with the Benjamini–Hochberg method.

References

    1. The International Work Group for Indigenous Affairs. The indigenous world 2021: Malaysia. (2021). https://www.iwgia.org/en/malaysia.html
    1. Mahmud M. H., Baharudin U. M. & Md Isa Z. Diseases among Orang Asli community in Malaysia: a systematic review. BMC Public Health 22, 2090 (2022). 10.1186/s12889-022-14449-2 - DOI - PMC - PubMed
    1. Tee M. Z. et al. Gut microbiome of helminth-infected indigenous Malaysians is context dependent. Microbiome 10, 214 (2022). 10.1186/s40168-022-01385-x - DOI - PMC - PubMed
    1. Lee S. C. et al. Helminth colonization is associated with increased diversity of the gut microbiota. PLoS Negl Trop Dis 8, e2880 (2014). 10.1371/journal.pntd.0002880 - DOI - PMC - PubMed
    1. Muslim A., Mohd Sofian S., Shaari S. A., Hoh B. P. & Lim Y. A. Prevalence, intensity and associated risk factors of soil transmitted helminth infections: A comparison between Negritos (indigenous) in inland jungle and those in resettlement at town peripheries. PLoS Negl Trop Dis 13, e0007331 (2019). 10.1371/journal.pntd.0007331 - DOI - PMC - PubMed

Publication types

LinkOut - more resources