Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 11;25(1):93.
doi: 10.1186/s13059-024-03233-7.

Scoary2: rapid association of phenotypic multi-omics data with microbial pan-genomes

Affiliations

Scoary2: rapid association of phenotypic multi-omics data with microbial pan-genomes

Thomas Roder et al. Genome Biol. .

Abstract

Unraveling bacterial gene function drives progress in various areas, such as food production, pharmacology, and ecology. While omics technologies capture high-dimensional phenotypic data, linking them to genomic data is challenging, leaving 40-60% of bacterial genes undescribed. To address this bottleneck, we introduce Scoary2, an ultra-fast microbial genome-wide association studies (mGWAS) software. With its data exploration app and improved performance, Scoary2 is the first tool to enable the study of large phenotypic datasets using mGWAS. As proof of concept, we explore the metabolome of yogurts, each produced with a different Propionibacterium reichii strain and discover two genes affecting carnitine metabolism.

Keywords: BGWA; Bacteria; Fermented food; GWAS; Genotype-phenotype association; Metabolite; Microbial genome-wide association studies; Omics; Pan-genome; Prokaryote.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Benchmarking of Scoary2’s automatic binarization based on simulated datasets for different effect sizes. Panels on the left show the distributions from which the numeric phenotype was sampled based on the presence or absence of a designated causal gene. Panels on the right indicate the rank of the causal gene in the output of Scoary in relation to the number of genomes in the simulated dataset. The black line indicates the average rank and the grey area indicates 90 % confidence interval based on 20 simulations
Fig. 2
Fig. 2
The first page (overview.html) of the Scoary2 data exploration app. A Dendrogram of traits. A cluster of carnitine-related traits is highlighted in yellow; the highest-scoring trait is selected (blue). B Negative logarithms of the p-values calculated by Scoary2: p-values range from high (left) to low (right); f stands for the p-value from Fisher’s test, e for the p-value from the post hoc test, and * for the product of the two values. C Trait names. D Trait search and navigation tool. E Trait metadata. It is updated when the mouse hovers over the traits in the dendrogram. F Plot legend
Fig. 3
Fig. 3
The second page (trait.html) of the Scoary2 data exploration app. A Trait name. B Phylogenetic tree of the isolates. C Top row: presence (black)/absence (white) of orthogene. Middle row: binarized trait. Bottom row: continuous trait. D List of best candidate orthogenes with associated p-values. E Coverage matrix: The numbers in the cells tell the number of genes in the genome that have the annotation. F Pie chart that shows how the orthogene and the trait intersect in the dataset. G Histogram of the continuous values, colored by whether each isolate has the orthogene (g+/g−) and the trait (t+/t−)
Fig. 4
Fig. 4
UMAP projections of mass spectrometry datasets. Each symbol represents one yogurt that was made with a different bacterial strain in addition to the starter culture YC-381. A LC-MS dataset: 2348 metabolites. B GC-MS volatiles dataset: 1541 metabolites. C Legend: each (sub-)species has a unique combination of color and symbol. The number in brackets indicates the number of yogurts made using the respective (sub-)species
Fig. 5
Fig. 5
Abundance of the metabolites that correlate with the putative carnitine transporter and corresponding gene loci of three yogurts made from starter cultures only and 44 yogurts made with additional Propionibacterium freudenreichii isolates. The figure is divided into two parts, depending on the completeness of the carnitine gene cluster of the isolates: the isolates on a blue background have a complete gene cluster, and the isolates on a red background have an incomplete gene cluster, resulting in varying metabolite compositions. A Heat map of the scaled metabolite abundances. Scale: blue (low) to average (white) to red (high). B Scale factor of each metabolite. C Color bar that indicates whether the mass spectrometry database suggested a match with carnitine in the name (green) or not (grey). The suggested names are shown below. Names highlighted in green were confirmed with standard substances. D Comparison of the associated gene cluster spanning from the MFS transporter (red) to fixX (dark blue). E Annotations of the orthogroups. Genes that belong to the same orthogroup are highlighted in the same color. The caiABC genes are colored in shades of green and the fixABCX genes in shades of blue. The putative carnitine transporter and hydrolase identified using Scoary2 are highlighted in red and violet, respectively

References

    1. Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, et al. Unifying the known and unknown microbial coding sequence space. eLife. 2022;11:e67667. doi: 10.7554/eLife.67667. - DOI - PMC - PubMed
    1. Zeki ÖC, Eylem CC, Reçber T, Kır S, Nemutlu E. Integration of GC-MS and LC-MS for untargeted metabolomics profiling. J Pharm Biomed Anal. 2020;190:113509. doi: 10.1016/j.jpba.2020.113509. - DOI - PubMed
    1. Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights. 2020;14:1177932219899051. doi: 10.1177/1177932219899051. - DOI - PMC - PubMed
    1. Akiyama M. Multi-omics study for interpretation of genome-wide association study. J Hum Genet. 2021;66:3–10. doi: 10.1038/s10038-020-00842-5. - DOI - PubMed
    1. San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, et al. Current affairs of microbial genome-wide association studies: approaches, bottlenecks and analytical pitfalls. Front Microbiol. 2019;10:3119. doi: 10.3389/fmicb.2019.03119. - DOI - PMC - PubMed

Publication types