Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2022 Jan;8(1):000747.
doi: 10.1099/mgen.0.000747.

regentrans: a framework and R package for using genomics to study regional pathogen transmission

Affiliations
Multicenter Study

regentrans: a framework and R package for using genomics to study regional pathogen transmission

Sophie Hoffman et al. Microb Genom. 2022 Jan.

Abstract

Increasing evidence of regional pathogen transmission networks highlights the importance of investigating the dissemination of multidrug-resistant organisms (MDROs) across a region to identify where transmission is occurring and how pathogens move across regions. We developed a framework for investigating MDRO regional transmission dynamics using whole-genome sequencing data and created regentrans, an easy-to-use, open source R package that implements these methods (https://github.com/Snitkin-Lab-Umich/regentrans). Using a dataset of over 400 carbapenem-resistant isolates of Klebsiella pneumoniae collected from patients in 21 long-term acute care hospitals over a one-year period, we demonstrate how to use our framework to gain insights into differences in inter- and intra-facility transmission across different facilities and over time. This framework and corresponding R package will allow investigators to better understand the origins and transmission patterns of MDROs, which is the first step in understanding how to stop transmission at the regional level.

Keywords: R package; genomics; pathogen spread; regional transmission; software; whole genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
regentrans input data. Required and optional isolate genetic data and metadata for using regentrans. Figure created on BioRender.com.
Fig. 2.
Fig. 2.
Data-specific method for choosing pairwise genetic distance thresholds. (a) Fraction of intra-facility pairs for various pairwise single nucleotide variant (SNV) distances. (b) Fraction of intra-facility pairs for various pairwise patristic distances grouped into 100 bins. µ/site/yr = mutations per site per year. These plots can help identify drops in intra-facility pair fractions that may indicate a reasonable pairwise distance threshold, assuming that intra-facility transmission is more common than inter-facility transmission. Note that it may be difficult to clearly identify a large drop at any given threshold; therefore, we recommend performing sensitivity analyses with several thresholds. Furthermore, users should consider the trade-off between sensitivity and specificity when deciding what thresholds to choose. This data can be generated using the get_frac_intra() function.
Fig. 3.
Fig. 3.
Pairwise genetic distances between facilities indicate recent intra- and inter-facility transmission. (a) Pairwise single nucleotide variant distances. Grey line is at a pairwise SNV distance of 10, and the inset shows all pairs with a pairwise SNV distance less than or equal to this threshold, which we consider indicative of recent transmission. See Q0 on genetic distance thresholds in the main text for more details on how we chose these thresholds. (b) Pairwise patristic distances. Grey line is at a pairwise patristic distance of 2.56×10−6 mutations per site per year, and the inset shows all pairs with a pairwise patristic distance less than or equal to this threshold. These plots indicate that transmission is likely to be occurring both within and between facilities, due to small pairwise genetic distances for both intra- and inter-facility pairs. Data generated using the get_pair_types() function.
Fig. 4.
Fig. 4.
Clusters of isolates from the same facility suggest intra-facility transmission. (a) Mapping isolate location on the phylogeny provides a visual indication of the extent of clustering by facility. Here we can see clustering of isolates from the same facility in several subclades of the phylogeny. (b) Quantification of the size of phylogenetic clusters from a single facility using get_clusters(). The size of the points in panel B can provide insight into whether large clusters are from an intra-facility outbreak (smaller point size) or sustained intra-facility transmission (larger point size). Points in panel B are jittered for visualisation purposes.
Fig. 5.
Fig. 5.
Some facility pairs have similar populations, indicating potential transmission between them. Genetic flow (Fsp) was calculated using the get_genetic_flow() function in regentrans. Rows and columns are facilities. Lower Fsp indicates more similar populations and thus more putative transmission. At least two isolates are required from a facility to perform the Fsp calculation as within-facility variation must be calculated, so facilities with only one isolate have been removed. The heatmap is clustered on the basis of Fsp values.
Fig. 6.
Fig. 6.
Some facilities have many closely related isolates, indicating potential intra- and inter-facility transmission. (a) Number of closely related isolate pairs for different genetic distance metrics and thresholds. Only facilities with at least one pair of closely related isolates are shown. µ/site/yr = mutations per site per year. (b) Network of patient connectedness. Patient nodes are connected by edges if they share an isolate with a pairwise SNV distance ≤6.
Fig. 7.
Fig. 7.
The pairwise SNV distance distribution does not change over time. (a) Count of pairwise single nucleotide variant (SNV) distances faceted by year. (b) Fraction of intra- vs. inter-facility pairwise SNV distances faceted by year. Trends are similar across both years.
Fig. 8.
Fig. 8.
Facilities with more patient flow tend to have more similar K. pneumoniae populations. (a) Patient flow and genetic flow (Fsp) are negatively correlated. (b) Patient flow and the minimum pairwise single nucleotide variant (SNV) distance (i.e. the SNV distance between the most closely related isolates) are negatively correlated. (c) Patient flow and number of closely related isolate pairs are positively correlated. Patient flow is the path of maximum patient flow. For this analysis we considered indirect transfers as long-term acute care hospitals are often not connected by direct transfers, but rather are connected by transfers to an intermediate facility such as an acute care hospital. Mean patient flow was calculated as the mean of the two-directional patient flow metrics between two facilities using the get_patient_flow() function. Lines were plotted using ggplot::geom_smooth() with the ‘lm’ method.
Fig. 9.
Fig. 9.
Geographically close facilities are often connected by closely related isolate pairs. (a) Facilities are located as they are geographically in space but latitude and longitude are de-identified by horizontal, vertical, and rotational shifts. The smaller single nucleotide variant (SNV) threshold was chosen for visualisation purposes. Size and colour of points indicates sample size, size and colour of edges indicates number of closely related pairs. (b) Physical distance between facilities is positively correlated with genetic flow (Fsp). (c) Physical distance between facilities is not correlated with minimum pairwise SNV distance. (d) Physical distance between facilities is negatively correlated with number of closely related isolate pairs (≤10 SNVs). The larger SNV threshold was chosen to give a wider distribution of number of closely related isolate pairs. Physical distance was calculated as the shortest distance between the points of latitude and longitude for the facility pair. Lines in panels b, c and D were plotted using ggplot::geom_smooth() with the ‘lm’ method.

References

    1. Tacconelli E, Carrara E, Savoldi A, Harbarth S, Mendelson M, et al. Discovery, research, and development of new antibiotics: the WHO priority list of antibiotic-resistant bacteria and tuberculosis. The Lancet Infectious Diseases. 2018;18:318–327. doi: 10.1016/S1473-3099(17)30753-3. - DOI - PubMed
    1. Blanco N, O’Hara LM, Harris AD. Transmission pathways of multidrug-resistant organisms in the hospital setting: a scoping review. Infect Control Hosp Epidemiol. 2019;40:447–456. doi: 10.1017/ice.2018.359. - DOI - PMC - PubMed
    1. Wang J, Foxman B, Pirani A, Lapp Z, Mody L, et al. Application of combined genomic and transfer analyses to identify factors mediating regional spread of antibiotic-resistant bacterial lineages. Clin Infect Dis. 2020;71:e642–e649. doi: 10.1093/cid/ciaa364. - DOI - PMC - PubMed
    1. Han JH, Lapp Z, Bushman F, Lautenbach E, Goldstein EJC, et al. Whole-genome sequencing to identify drivers of carbapenem-resistant Klebsiella pneumoniae transmission within and between regional long-term acute-care hospitals. Antimicrob Agents Chemother. 2019;63:e01622-19. doi: 10.1128/AAC.01622-19. - DOI - PMC - PubMed
    1. Paul P, Slayton RB, Kallen AJ, Walters MS, Jernigan JA. Modeling regional transmission and containment of a healthcare-associated multidrug-resistant oegional transmission and containment of a healthcare-associated multidrug-resistant organism. Clin Infect Dis. 2020;70:388–394. doi: 10.1093/cid/ciz248. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources