Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 14;25(1):959.
doi: 10.1186/s12864-024-10876-0.

Metagenomic assemblies tend to break around antibiotic resistance genes

Affiliations

Metagenomic assemblies tend to break around antibiotic resistance genes

Anna Abramova et al. BMC Genomics. .

Abstract

Background: Assembly of metagenomic samples can provide essential information about the mobility potential and taxonomic origin of antibiotic resistance genes (ARGs) and inform interventions to prevent further spread of resistant bacteria. However, similar to other conserved regions, such as ribosomal RNA genes and mobile genetic elements, almost identical ARGs typically occur in multiple genomic contexts across different species, representing a considerable challenge for the assembly process. Usually, this results in many fragmented contigs of unclear origin, complicating the risk assessment of ARG detections. To systematically investigate the impact of this issue on detection, quantification and contextualization of ARGs, we evaluated the performance of different assembly approaches, including genomic-, metagenomic- and transcriptomic-specialized assemblers. We quantified recovery and accuracy rates of each tool for ARGs both from in silico spiked metagenomic samples as well as real samples sequenced using both long- and short-read sequencing technologies.

Results: The results revealed that none of the investigated tools can accurately capture genomic contexts present in samples of high complexity. The transcriptomic assembler Trinity showed a better performance in terms of reconstructing longer and fewer contigs matching unique genomic contexts, which can be beneficial for deciphering the taxonomic origin of ARGs. The currently commonly used metagenomic assembly tools metaSPAdes and MEGAHIT were able to identify the ARG repertoire but failed to fully recover the diversity of genomic contexts present in a sample. On top of that, in a complex scenario MEGAHIT produced very short contigs, which can lead to considerable underestimation of the resistome in a given sample.

Conclusions: Our study shows that metaSPAdes and Trinity would be the preferable tools in terms of accuracy to recover correct genomic contexts around ARGs in metagenomic samples characterized by uneven coverages. Overall, the inability of assemblers to reconstruct long ARG-containing contigs has impacts on ARG quantification, suggesting that directly mapping reads to an ARG database should be performed as a complementary strategy to get accurate ARG abundance and diversity measures.

Keywords: Antibiotic resistance genes; Genomic context; Metagenomic assembly.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Workflow. A Simulated scenario constituting a real metagenomic dataset spiked with reads generated from a set of plasmids containing ARGs. B Real case scenario included long-reads which were used as a reference to quality check the contigs assembled from short read data generated from the same sample
Fig. 2
Fig. 2
ARGs recovery by each tool. A Presence/absence of ARGs on the contigs assembled by different tools. B Presence/absence of ARGs after filtering using a length cut-off of 300 bp was applied to the results. “Full” denotes contigs containing full length and correctly assembled ARGs while “Truncated” comprises contigs containing partial ARG sequence (minimum 60 bp and 98% identity and no flanking regions on either of the sides or both sides)
Fig. 3
Fig. 3
Assembler performance at different coverages. A Proportion of full, truncated and misassembled/partial ARG sequences. Note that the retrieved ARGs are not necessarily associated with the correct context.). B Length distribution of contigs with ARG hits. Contigs with correct genomic context, only containing full ARGs, are marked with red dots
Fig. 4
Fig. 4
Visual representation of assembly results. The example used is one of the plasmids AP023079, containing two ARGs blaNDM and aph(3’’)-Ib. A visual representation of the plasmid was done using FARAO with light gray representing the backbone plasmid and the other colors representing correctly assembled contigs from samples with different total number of reads (“low” in pink, “medium” in teal, “high” in purple and “very high” in blue) and ARGs are in red. A visual representation of the corresponding assembly graphs was done using Bandage. The figures represent only part of the whole assembly graph corresponding to the AP023079 plasmid sequence, where blue lines correspond to BLAST hits of the assembled contigs to the plasmid and pink lines to the ARG regions
Fig. 5
Fig. 5
Assembly of blaNDM-1 gene. A Alignment of the contigs containing blaNDM-1 gene to a reference plasmid. The top figure depicts a part of the reference plasmid CP055250.1 with 10 kb upstream and downstream of the blaNDM-1 gene (in red). Coding sequences of neighboring genes are shown as grey arrows. Note that the contigs start and/or end within genes encoding transposases or insertion sequences (ISs). B The top panel represents a part of the assembly graph for the “medium” metaSPAdes assembly containing blaNDM-1 gene (in red). Each subsequent graph shows the mapping of the output contig as well as paths corresponding to the original plasmid sequences
Fig. 6
Fig. 6
Results from comparison between short and long read data. A ARG quantification by using either assembled contigs or direct mapping of short reads to the ResFinder database, represented as log10(per base coverage). B Number of unique ARGs identified on assembled contigs (Ray, Velvet and TriMetAss are not shown). C Length distribution of contigs assembled from short reads matching the PacBio reference reads, with dots representing individual contigs. D PCoA based on Bray–Curtis dissimilarity between different quantification methods
Fig. 7
Fig. 7
ARG quantification using either assembled contigs as a reference or by directly mapping short reads to the ResFinder database. Per base total coverage calculated using FARAO from aligning reads to the contigs, and using direct ARG quantification by mapping reads to the ResFinder database (A) and ResFinder clustered to 90% identity (B)

References

    1. Murray CJ, Ikuta KS, Sharara F, Swetschinski L, Aguilar GR, Gray A, et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet. 2022;399(10325):629–55. - PMC - PubMed
    1. Munk P, Brinch C, Møller FD, Petersen TN, Hendriksen RS, Seyfarth AM, et al. Genomic analysis of sewage from 101 countries reveals global landscape of antimicrobial resistance. Nat Commun. 2022;13(1):7251. - PMC - PubMed
    1. Hendriksen RS, Munk P, Njage P, van Bunnik B, McNally L, Lukjancenko O, et al. Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage. Nat Commun. 2019;10(1):1124. - PMC - PubMed
    1. Pruden A, Vikesland PJ, Davis BC, de Roda Husman AM. Seizing the moment: now is the time for integrated global surveillance of antimicrobial resistance in wastewater environments. Curr Opin Microbiol. 2021;64:91–9. - PubMed
    1. Bengtsson-Palme J, Abramova A, Berendonk TU, Coelho LP, Forslund SK, Gschwind R, et al. Towards monitoring of antimicrobial resistance in the environment: For what reasons, how to implement it, and what are the data needs? Environ Int. 2023;178:108089. - PubMed

MeSH terms

LinkOut - more resources