Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 16:2023.03.11.532203.
doi: 10.1101/2023.03.11.532203.

Intragenic DNA inversions expand bacterial coding capacity

Affiliations

Intragenic DNA inversions expand bacterial coding capacity

Rachael B Chanin et al. bioRxiv. .

Update in

  • Intragenic DNA inversions expand bacterial coding capacity.
    Chanin RB, West PT, Wirbel J, Gill MO, Green GZM, Park RM, Enright N, Miklos AM, Hickey AS, Brooks EF, Lum KK, Cristea IM, Bhatt AS. Chanin RB, et al. Nature. 2024 Oct;634(8032):234-242. doi: 10.1038/s41586-024-07970-4. Epub 2024 Sep 25. Nature. 2024. PMID: 39322669

Abstract

Bacterial populations that originate from a single bacterium are not strictly clonal. Often, they contain subgroups with distinct phenotypes. Bacteria can generate heterogeneity through phase variation: a preprogrammed, reversible mechanism that alters gene expression levels across a population. One well studied type of phase variation involves enzyme-mediated inversion of specific intergenic regions of genomic DNA. Frequently, these DNA inversions flip the orientation of promoters, turning ON or OFF adjacent coding regions within otherwise isogenic populations. Through this mechanism, inversion can affect fitness, survival, or group dynamics. Here, we develop and apply bioinformatic approaches to discover thousands of previously undescribed phase-variable regions in prokaryotes using long-read datasets. We identify 'intragenic invertons', a surprising new class of invertible elements found entirely within genes, in bacteria and archaea. To date, inversions within single genes have not been described. Intragenic invertons allow a gene to encode two or more versions of a protein by flipping a DNA sequence within the coding region, thereby increasing coding capacity without increasing genome size. We experimentally characterize specific intragenic invertons in the gut commensal Bacteroides thetaiotaomicron, presenting a 'roadmap' for investigating this new gene-diversifying phenomenon.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Short-read metagenomic datasets reveal intragenic invertons in Bacteroides thetaiotaomicron (BTh).
(A) An overview of the analysis pipeline for identifying putative invertons in short-read datasets. (B) A heatmap of the inversion proportion of intragenic invertons in BTh. Samples with no intragenic invertons were removed. Rows labeled with a gene name represent intragenic invertons with PCR and Sanger sequencing evidence of inversion. (C-D) Genome diagrams for confirmed intragenic invertons in BTh. Gray bars indicate putative invertons without sequencing support. Red bars indicate invertons with sequencing evidence. (C) Left - Genome diagram of the region surrounding the BT0375 recoding intragenic inverton, and a domain diagram of the BT0375 gene with the location of the inverton IRs indicated. Right - AlphaFold overlay of the BT0375 forward (blue) pLDDT 89.91 and reverse (green) pLDDT 85.24. The region that is recoded is circled in red. (D) Genome diagram of the region surrounding the BT3786 premature stop codon intragenic inverton, and a domain diagram of the gene. The consequence of the inversion and resulting two predicted ORFs are indicated in the domain diagram.
Fig. 2.
Fig. 2.. Developing and optimizing PhaVa, a long-read based, accurate inverton caller.
(A) Schematic of PhaVa’s workflow. Putative invertons are identified, and long-reads are mapped to both a forward (highlighted by the black dashed lines) and reverse orientation (highlighted by the gray dashed lines) version of the inverton and surrounding genomic sequence, similar to PhaseFinder. Reads that do not map across the entire inverton and into the flanking sequence on either side, or have poor mapping characteristics are removed. See methods for details. (B-C) Optimizing cutoffs for the minimum number of reverse reads as both a raw number and percentage of all reads, to reduce false positive inverton calls with simulated reads. Cell color and number represent (B) the false positive rate per simulated readset and (C) the total number of unique false positives across all simulated datasets. (D) False positives in simulated data plotted per species. All measurements were made with a minimum of three reverse reads cutoff and varying the percentage of minimum reverse reads cutoff. Dashed line indicates the minimum reverse reads percent cutoff used for isolate and metagenomic datasets.
Fig. 3.
Fig. 3.. PhaVa analysis of isolate long-read sequencing data reveals intragenic inversions are prevalent across the bacterial tree of life.
(A) The total number of invertons found within various bacterial phyla from 29,989 publicly available long-read isolate sequencing datasets. Green bars refer to intergenic invertons. Orange bars refer to intragenic invertons. Blue bars refer to partial intergenic invertons. Asterisks denote phyla within Archaea. Inset corresponds to the portion of the bar graph outlined in dotted lines. (B) The mean number of invertons found per genome within a phylum, of genomes that had at least one inverton. Asterisks denote phyla within Archaea. (C) The distribution of lengths of identified invertons, grouped by inverton type (intergenic, partial intergenic - denoted ‘partial’, and intragenic). Median value is indicated by gray dots. Partial length distribution was found to be significantly different from intergenic (p=0.0) and intragenic (p=4.5e-146) with a t-test (D) The distribution of inversion rates of identified invertons, defined as the percentage of reads mapped in the reverse orientation. Median value is indicated by gray dots. (E) Pfam clan enrichment across several genera. Dot size and fill color is proportional to the mean log-odds ratio, an effect size measure for the enrichment, and the length of the line indicates the fraction of included genera in which an enrichment score for the specific clan could be calculated.
Fig. 4.
Fig. 4.. Consequences of inversion in thiamine biosynthesis protein (thiC).
(A) Schematic showing the location of the thiC intragenic inverton (red bar). Inverton flipping results in a premature stop codon located between two protein-folding domains in ThiC. Black arrows indicate the binding location of primers used to determine the orientation of inverton. (B) PCR confirmation of the thiC intragenic inverton in both genomic DNA and reverse transcribed RNA (cDNA). PCR products of the expected size were extracted and confirmed with Sanger sequencing. (C) BTh strains were grown in defined media with the indicated concentrations of thiamine. The maximum optical density of each strain reached was recorded. Each point represents the average of six replicates conducted across two separate experiments. Mean and standard deviation are shown. Locked forward (blue line), locked reverse (gray line), thiC knockout (purple line), and wild-type (black line) are presented. (D) Locked strains were competed against each other in thiamine-containing media. The competitive growth experiment was performed in two different ways with the antibiotic resistance marker cassettes flipped between the two versions. Black bars indicate the locked forward strain marked with erythromycin resistance and locked reverse strain marked with tetracycline resistance. White bars indicate the locked forward strain marked with tetracycline resistance and locked reverse strain marked with erythromycin resistance. The competitive index was determined. Geometric mean and geometric standard deviation are shown for 8–12 replicates across 4–6 independent experiments.

References

    1. Hooper D. C. & Jacoby G. A. Mechanisms of drug resistance: quinolone resistance. Ann. N. Y. Acad. Sci. 1354, 12–31 (2015). - PMC - PubMed
    1. Woodford N. & Ellington M. J. The emergence of antibiotic resistance by mutation. Clin. Microbiol. Infect. 13, 5–18 (2007). - PubMed
    1. Björkman J. & Andersson D. I. The cost of antibiotic resistance from a bacterial perspective. Drug Resist. Updat. 3, 237–245 (2000). - PubMed
    1. Meydan S., Vázquez-Laslop N. & Mankin A. S. Genes within Genes in Bacterial Genomes. Microbiol Spectr 6, (2018). - PMC - PubMed
    1. Zhong A. et al. Toxic antiphage defense proteins inhibited by intragenic antitoxin proteins. Proc. Natl. Acad. Sci. U. S. A. 120, e2307382120 (2023). - PMC - PubMed

Publication types