Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;27(1):140-153.e9.
doi: 10.1016/j.chom.2019.10.022. Epub 2019 Dec 17.

A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation

Affiliations

A Bioinformatic Analysis of Integrative Mobile Genetic Elements Highlights Their Role in Bacterial Adaptation

Matthew G Durrant et al. Cell Host Microbe. .

Erratum in

Abstract

Mobile genetic elements (MGEs) contribute to bacterial adaptation and evolution; however, high-throughput, unbiased MGE detection remains challenging. We describe MGEfinder, a bioinformatic toolbox that identifies integrative MGEs and their insertion sites by using short-read sequencing data. MGEfinder identifies the genomic site of each MGE insertion and infers the identity of the inserted sequence. We apply MGEfinder to 12,374 sequenced isolates of 9 prevalent bacterial pathogens, including Mycobacterium tuberculosis, Staphylococcus aureus, and Escherichia coli, and identify thousands of MGEs, including candidate insertion sequences, conjugative transposons, and prophage elements. The MGE repertoire and insertion rates vary across species, and integration sites often cluster near genes related to antibiotic resistance, virulence, and pathogenicity. MGE insertions likely contribute to antibiotic resistance in laboratory experiments and clinical isolates. Additionally, we identified thousands of mobility genes, a subset of which have unknown function opening avenues for exploration. Future application of MGEfinder to commensal bacteria will further illuminate bacterial adaptation and evolution.

Keywords: adaptation; antibiotic resistance; bacteria; bioinformatics; insertion sequences; mobile genetic elements; pathogen; phage; transposable elements; transposon.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. An approach to identify a variety of integrative MGEs from short-read sequencing data.
a, MGEfinder workflow schematic. See main text and STAR methods for details of each step. See Figure S2 for MGEfinder performance metrics. TSD: target site duplication. b, The proportion of elements identified by each inference method for the downloaded isolates of nine bacterial pathogens. See STAR methods and Figure S1 for description of each inference technique and how ambiguous insertions were resolved. c, An analysis of the types of elements identified in the MGEfinder workflow when applied to nine bacterial pathogens, with element length on the x-axis (log-scale), and element count on the y-axis. A “unique element” refers to a unique cluster of elements (see STAR methods for details). See Figure S1g for a schematic representation of each type of element. See Figure S3 for identified IS families. IS: insertion sequence; CDS: coding sequence; TIR: terminal inverted repeat.
Figure 2:
Figure 2:. Bacterial species vary considerably in their overall MGE repertoire and rate of insertion.
a, The number of unique sequence elements identified, binned by species and total unique insertion sites per element. “Total unique insertion sites” refers to the number of unique sites where members of a given element cluster can be found across all isolates for each species. b, Types of MGEs at different levels of mobility. The bins along the x-axis are the same as in panel a. Element categories are indicated by the colors in the legend. c, An accumulation curve of the number of new TE insertions identified as additional isolates are analyzed. See also Figure S4. d, Notched boxplots of the number of rare TE insertions detected across all samples for each species. A rare TE insertion is defined as a TE insertion identified in <1% of all samples. Rare insertions are adjusted by the number of genomic sites in the sequenced isolates with non-zero coverage, and then multiplied by 1 megabase. Notches indicate 1.58 x interquartile range / sqrt(n), a rough 95% confidence interval for comparing medians (Mcgill et al. 1978). Outliers are excluded from this figure. See also Figure S5.
Figure 3:
Figure 3:. Many identified MGEs include passenger genes, some of which are largely uncharacterized.
a, Mobility genes identified when analyzing coding sequences found in mobile elements across species. Mobility genes are grouped into those found at >10 or >100 unique insertion sites. See STAR methods for description of each term in the legend. b, Bar plots depict the number of elements falling in the four lowest-confidence element categories: “Contains CDS”, “Contains CDS + TIR”, “No CDS”, and “No CDS + TIR”. In each category, the proportion of elements with a predicted mobility gene (purple), no predicted mobility gene but high BLAST similarity to another element with a mobility gene (blue), and all other elements (grey) are indicated. Percentages on the top of the bars indicate the percentage of elements in each bin falling in the purple or blue categories. c, The number of unique insertion sites identified where a mobile element containing each labeled passenger protein was found. Restricted to only elements that contain at least one predicted transposase. Ties in the number of unique sites generally indicates the passenger proteins are on the same element. d, Two examples of mobile elements carrying passenger proteins. (1) a copper resistance mobile element found in E. coli; (2) a mobile element containing a beta-lactamase found in S. aureus. Yellow rectangles: coding sequences with predicted transposase activity; purple rectangles: all other predicted coding sequences.
Figure 4:
Figure 4:. MGE insertion-enriched sites occur near functionally important genes and pathways.
a, An analysis of MGE insertion-enriched sites found on each species’ chromosome. Insertion-enriched sites were assigned to coding sequences by choosing the coding sequence closest to the center of each insertion-enriched site. All insertion-enriched sites meeting FDR < 0.05 are indicated with the colored points. The 15 most significant insertion-enriched sites associated with well-annotated coding sequences are shown in the text labels. The insertion-enriched site is shown to be upstream (blue), within (red), or downstream of the nearest coding sequence (green). b, Gene Ontology (GO) enrichment analysis of predicted coding sequences near MGE insertion-enriched sites. All coding sequences near significant insertion-enriched sites were tested for enrichment of each GO term using a hypergeometric test; all 44 significantly enriched GO terms are presented.
Figure 5:
Figure 5:. An analysis of MGE insertion sites reveals their target-site specificity.
Examples of target-sequence motifs identified for five different MGEs with high target-sequence specificity. “# Targets” refers to the number of unique insertion sites analyzed for each MGE. “% Targets” indicates the percentage of target sites containing the motif. “% BG” indicates the percentage of randomly chosen background sequences containing the motif.
Figure 6:
Figure 6:. MGE insertions contribute to antibiotic resistance in adaptive laboratory evolution experiments.
a, A schematic representation of the intermediate-step trimethoprim megaplate experiment conducted by Baym et al. demonstrating the MGE insertion count in each sequenced isolate collected from the noted position on the megaplate. See also Figure S6. b, The number of independent mutation events assumed to affect each gene listed in the intermediate-step megaplate experiment conducted by Baym et al. The number of independent mutations are grouped according to mechanism: black - by MGE insertion; grey - by point mutations / short indels. This visualization includes MGE insertions affecting a gene only once (yeiL, ydjN, gshA, yeaR) and excludes all point mutations/short indels affecting a gene only once. c, An analysis of the results of the chloramphenicol (CHL) and doxycycline (DOX) morbidostat experiment conducted by Toprak et al., supplemented with IS insertion information. The legend in panel b also applies to panel c.
Figure 7:
Figure 7:. MGE insertions disrupt known antibiotic resistance genes in clinical isolates.
a, An analysis of MGE insertion-enriched sites for two collections of clinical E. coli isolates. The third panel is the same as the E. coli panel shown in Figure 4a (insertion-enriched site analysis on randomly downloaded E. coli isolates from SRA), highlighting the insertion-enriched sites shared with either of the clinical isolate collections. See also Figure S7. b, Unique acrR MGE insertions found in the Hospital collection, the MDR collection, and the randomly downloaded E. coli isolates from the SRA database, c, All unique ompF MGE insertions found in the same three isolate collections.

References

    1. Abby SS et al. , 2016. Identification of protein secretion systems in bacterial genomes. Scientific reports, 6, p.23080. - PMC - PubMed
    1. Afgan E et al. , 2016. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic acids research, 44(W1), pp.W3–W10. - PMC - PubMed
    1. Andrews S & Others, 2010. FastQC: a quality control tool for high throughput sequence data.
    1. Arndt D et al. , 2016. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic acids research, 44(W1), pp.W16–21. - PMC - PubMed
    1. Bankevich A et al. , 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology: a journal of computational molecular cell biology, 19(5), pp.455–477. - PMC - PubMed

Publication types

MeSH terms

Substances