Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep;7(9):000634.
doi: 10.1099/mgen.0.000634.

Flanker: a tool for comparative genomics of gene flanking regions

Affiliations

Flanker: a tool for comparative genomics of gene flanking regions

William Matlock et al. Microb Genom. 2021 Sep.

Abstract

Analysing the flanking sequences surrounding genes of interest is often highly relevant to understanding the role of mobile genetic elements (MGEs) in horizontal gene transfer, particular for antimicrobial-resistance genes. Here, we present Flanker, a Python package that performs alignment-free clustering of gene flanking sequences in a consistent format, allowing investigation of MGEs without prior knowledge of their structure. These clusters, known as 'flank patterns' (FPs), are based on Mash distances, allowing for easy comparison of similarity across sequences. Additionally, Flanker can be flexibly parameterized to fine-tune outputs by characterizing upstream and downstream regions separately, and investigating variable lengths of flanking sequence. We apply Flanker to two recent datasets describing plasmid-associated carriage of important carbapenemase genes (blaOXA-48 and blaKPC-2/3) and show that it successfully identifies distinct clusters of FPs, including both known and previously uncharacterized structural variants. For example, Flanker identified four Tn4401 profiles that could not be sufficiently characterized using TETyper or MobileElementFinder, demonstrating the utility of Flanker for flanking-gene characterization. Similarly, using a large (n=226) European isolate dataset, we confirm findings from a previous smaller study demonstrating association between Tn1999.2 and blaOXA-48 upregulation and demonstrate 17 FPs (compared to the 5 previously identified). More generally, the demonstration in this study that FPs are associated with geographical regions and antibiotic-susceptibility phenotypes suggests that they may be useful as epidemiological markers. Flanker is freely available under an MIT license at https://github.com/wtmatlock/flanker.

Keywords: antimicrobial resistance (AMR); bioinformatics; mobile genetic element (MGE); plasmid; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Schematic of Flanker’s modes and parameters. (a) Flanker uses Abricate to annotate the gene of interest in input sequences and outputs associated flanking sequences, optionally clustering (-cl) these on a user-defined Mash distance threshold. It can take linear or circularized sequences. (b) In this example, genes geneA and geneB have been queried (-g geneA geneB), and only the upstream flank is desired (-f upstream). The top single black arrow represents choosing a single window of length 3000 bp (-w 3000), whereas the bottom three black arrows represent stepping in 1000 bp windows from 0 to 3000 bp (-w 0 -wstep 1000 -wstop 3000). The default mode (-m default) extracts flanks for all annotated alleles separately, but the multi-allelic mode (-m mm) extracts flanks for all alleles in parallel. (c) Flanker has a supplementary salami mode (-m sm), which outputs non-contiguous blocks of sequence with a start point, step size and end point (-w 0 -wstep 1000 -wstop 3000), represented by the three black arrows.
Fig. 2.
Fig. 2.
Flanking regions 5000 bp upstream of bla OXA-48 in plasmids from K . pneumoniae isolates. The Tree panel is a neighbour-joining tree reconstructed from Mash distances between complete sequences of plasmids carrying the bla OXA-48 gene. The second panel indicates the presence/absence of a L/M(pOXA-48)-type plasmid. The Gene Graphical Representation panel schematically represents coding regions in the 5000 bp sequence upstream of the bla OXA-48 gene, which is shown in red. Other genes are coloured according to the FP, which considers the overall pattern of all 100 bp window clusters up to 2200 bp (the approximate upstream limit of Tn1999). The Flankergram panel shows window clusters of all groups over each 100 bp window between 0 and 5000 bp. The dotted line at 2200 bp indicates the approximate point of upstream divergence between several FPs. The MLST panel shows K. pneumoniae multilocus sequence types, with those occurring once labelled ‘other’. FPs are numbered in ascending order according to abundance in the hybrid assemblies. Data used to make this figure came from the Dutch CPE surveillance and EuSCAPE hybrid assembly datasets.
Fig. 3.
Fig. 3.
Flanking regions 7200 bp upstream of bla KPC-2/3 in plasmids from K. pneumoniae isolates. The Tree panel is a neighbour-joining tree reconstructed from Mash distances between complete sequences of plasmids carrying the bla KPC-2/3 gene. The next three panels indicate the presence/absence of FIB(pQ1I)-, FII(pKP91)- and FIB(Kpn3)-type plasmids. The Gene column indicates which bla KPC allele (2 or 3) is present. The Gene Graphical Representation panel schematically represents coding regions in the 7200 bp sequence region upstream of the bla KPC-2/3 gene, which is shown in red. Other genes are coloured according to the FP, which here takes into account the overall pattern of all 100 bp window groups (shown in the Flankergram panel) over the full 7200 bp region upstream of blaKPC-2/3 . The Flankergram shows window clusters over each 100 bp window between 0 and 7200 bp. The MLST panel shows K. pneumoniae multilocus sequence types, with those occurring once labelled ‘other’. The final two panels show the Galileo AMR and the TETyper outputs for the eight FPs, respectively. The FPs are numbered in ascending order according to abundance in the hybrid assemblies.

References

    1. Lipworth S, Vihta K-D, Chau K, Barker L, George S, et al. Molecular epidemiology of Escherichia coli and Klebsiella species bloodstream infections in Oxfordshire (UK) 2008-2018. medRxiv. 2021 doi: 10.1101/2021.01.05.20232553. - DOI - PMC - PubMed
    1. Vihta K-D, Stoesser N, Llewelyn MJ, Quan TP, Davies T, et al. Trends over time in Escherichia coli bloodstream infections, urinary tract infections, and antibiotic susceptibilities in Oxfordshire, UK, 1998–2016: a study of electronic health records. Lancet Infect Dis. 2018;18:1138–1149. doi: 10.1016/S1473-3099(18)30353-0. - DOI - PMC - PubMed
    1. Buetti N, Atkinson A, Marschall J, Kronenberg A, Swiss Centre for Antibiotic Resistance (ANRESIS) Incidence of bloodstream infections: a nationwide surveillance of acute care hospitals in Switzerland 2008–2014. BMJ Open. 2017;7:e013665. doi: 10.1136/bmjopen-2016-013665. - DOI - PMC - PubMed
    1. Thanner S, Drissner D, Walsh F. Antimicrobial resistance in agriculture. mBio. 2016;7:e02227–15. doi: 10.1128/mBio.02227-15. - DOI - PMC - PubMed
    1. Wyres KL, Holt KE. Klebsiella pneumoniae as a key trafficker of drug resistance genes from environmental to clinically important bacteria. Curr Opin Microbiol. 2018;45:131–139. doi: 10.1016/j.mib.2018.04.004. - DOI - PubMed

Publication types

LinkOut - more resources