Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Sep 25:2024.01.20.576280.
doi: 10.1101/2024.01.20.576280.

Higher-order epistasis within Pol II trigger loop haplotypes

Affiliations

Higher-order epistasis within Pol II trigger loop haplotypes

Bingbing Duan et al. bioRxiv. .

Update in

Abstract

RNA polymerase II (Pol II) has a highly conserved domain, the trigger loop (TL), that controls transcription fidelity and speed. We previously probed pairwise genetic interactions between residues within and surrounding the TL for the purpose of understand functional interactions between residues and to understand how individual mutants might alter TL function. We identified widespread incompatibility between TLs of different species when placed in the Saccharomyces cerevisiae Pol II context, indicating species-specific interactions between otherwise highly conserved TLs and its surroundings. These interactions represent epistasis between TL residues and the rest of Pol II. We sought to understand why certain TL sequences are incompatible with S. cerevisiae Pol II and to dissect the nature of genetic interactions within multiply substituted TLs as a window on higher order epistasis in this system. We identified both positive and negative higher-order residue interactions within example TL haplotypes. Intricate higher-order epistasis formed by TL residues was sometimes only apparent from analysis of intermediate genotypes, emphasizing complexity of epistatic interactions. Furthermore, we distinguished TL substitutions with distinct classes of epistatic patterns, suggesting specific TL residues that potentially influence TL evolution. Our examples of complex residue interactions suggest possible pathways for epistasis to facilitate Pol II evolution.

Keywords: deep mutational scanning; epistasis; haplotypes; trigger loop.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST The authors declare no conflict of interest.

Figures

Figure 1.
Figure 1.. Systematic detection of TL-internal epistasis with natural TL alleles and intermediates.
A. The Pol II TL is the key domain balancing transcription speed and fidelity in Pol II active site. Left panel: yeast Pol II structure (PDB: 5C4X). Right panel: The open, catalytic disfavoring (PDB: 5C4X) (BARNES et al. 2015) and closed, catalytic favoring (PDB: 2E2H) (WANG et al. 2006) conformations of TL in the active site. B. We selected 632 TL haplotypes representing TL alleles from bacterial, archaeal and the three conserved eukaryotic msRNAPs to detect TL-internal epistasis (Bacterial TLs n = 182, archaeal TLs n = 116, Pol I TLs n = 144, Pol II TLs n = 94, Pol III TLs n = 95). C. The selected TL alleles were synthesized and transformed into yeast Pol II to form chimeric Pol II enzymes. Yeast chimeric Pol II enzymes were phenotyped under selective conditions to detect growth defects, which are represented by fitness (see Methods). D. Seven Pol II TL alleles were selected to construct intermediate haplotypes representing all possible combinations of substitutions of the seven selected alleles. The intermediates were transformed into yeast to measure growth defects as in C. E-F. Analytical scheme of primary deviation score (E) and secondary deviation score (F). Details are in Methods.
Figure 2.
Figure 2.. Greater TL-internal epistasis is observed with closer evolutionary distance to eukaryotic Pol II.
A-E. Fitness and primary deviation score heatmaps of TL haplotypes from five msRNAP evolutionary groups. The x-axis of each heatmap is the 31 residue positions of the Pol II TL (1076–1106). The y-axis of each heatmap is the TL haplotypes belonging to each group clustered by hierarchical clustering with Euclidean distance. Each row represents one haplotype with several single substitutions. Light grey blocks in each row represent the residue from the haplotype at the position is the same with yeast Pol II TL residue, in other words, no substitution at the position. Colored blocks represent different residues in the haplotype compared with yeast Pol II TL (substitutions). The color of the block represents growth fitness of the single substitution in the yeast Pol II TL background. Expected fitness of the haplotypes were calculated from the log additive model for individual substitutions. Observed fitness was measured in the screening experiments, and deviation scores were calculated by comparing the observed and expected scores (shown at the right end of each row). Sequence logos were generated with multiple sequence alignment (MSA) of the five groups individually in Weblogo 3.7.12. The labeled numbers of the sequence logo represent yeast Pol II TL residue position (1076–1106). Bacteria n=465. Archaea n=426. Pol I n=605. Pol II n=405. Pol III n=444. F. Lethal haplotypes can contain three distinct types of lethality: one attributed to synthetic lethal interactions among substitutions, representing negative interactions that result in lethality beyond expectation, additive lethality expected from combination of individual substitutions, and that arising directly from the presence of an individually lethal single substitution. The calculated ratio specifically reflects the proportion of lethality due to negative interactions and additive lethality within all types of lethal haplotypes. G. The ratio of viable haplotypes in haplotypes containing lethal substitutions, representing the ratio of positive interactions (suppression) in TL haplotypes of each group. Haplotypes containing lethal substitutions are expected to be lethal based on the additive model. If haplotypes with lethal substitutions are observed to be viable, it suggests other substitutions suppress the lethal substitution in the haplotypes, implying positive epistasis (suppression). Approximately 20% of examined eukaryotic TL haplotypes containing individual lethal substitutions are viable, whereas only roughly 2% of bacterial and archaeal TL haplotypes are viable, suggesting positive interactions in eukaryotic TLs can buffer negative effects in more closely related TL sequences.
Figure 2.
Figure 2.. Greater TL-internal epistasis is observed with closer evolutionary distance to eukaryotic Pol II.
A-E. Fitness and primary deviation score heatmaps of TL haplotypes from five msRNAP evolutionary groups. The x-axis of each heatmap is the 31 residue positions of the Pol II TL (1076–1106). The y-axis of each heatmap is the TL haplotypes belonging to each group clustered by hierarchical clustering with Euclidean distance. Each row represents one haplotype with several single substitutions. Light grey blocks in each row represent the residue from the haplotype at the position is the same with yeast Pol II TL residue, in other words, no substitution at the position. Colored blocks represent different residues in the haplotype compared with yeast Pol II TL (substitutions). The color of the block represents growth fitness of the single substitution in the yeast Pol II TL background. Expected fitness of the haplotypes were calculated from the log additive model for individual substitutions. Observed fitness was measured in the screening experiments, and deviation scores were calculated by comparing the observed and expected scores (shown at the right end of each row). Sequence logos were generated with multiple sequence alignment (MSA) of the five groups individually in Weblogo 3.7.12. The labeled numbers of the sequence logo represent yeast Pol II TL residue position (1076–1106). Bacteria n=465. Archaea n=426. Pol I n=605. Pol II n=405. Pol III n=444. F. Lethal haplotypes can contain three distinct types of lethality: one attributed to synthetic lethal interactions among substitutions, representing negative interactions that result in lethality beyond expectation, additive lethality expected from combination of individual substitutions, and that arising directly from the presence of an individually lethal single substitution. The calculated ratio specifically reflects the proportion of lethality due to negative interactions and additive lethality within all types of lethal haplotypes. G. The ratio of viable haplotypes in haplotypes containing lethal substitutions, representing the ratio of positive interactions (suppression) in TL haplotypes of each group. Haplotypes containing lethal substitutions are expected to be lethal based on the additive model. If haplotypes with lethal substitutions are observed to be viable, it suggests other substitutions suppress the lethal substitution in the haplotypes, implying positive epistasis (suppression). Approximately 20% of examined eukaryotic TL haplotypes containing individual lethal substitutions are viable, whereas only roughly 2% of bacterial and archaeal TL haplotypes are viable, suggesting positive interactions in eukaryotic TLs can buffer negative effects in more closely related TL sequences.
Figure 3.
Figure 3.. Phenotypes fluctuate with changes in substitution composition within haplotype groups.
A. Fitness of constituent single substitutions, and expected and observed fitness of the haplotypes, and deviation score are shown in heatmap for the selected seven haplotypes. The deviations shown in the heatmap are the primary deviation scores calculated by comparing observed fitness to expected fitness. B-H. Distribution of growth fitness across all intermediate substitution combinations categorized by number of amino acid substitutions for each selected TLs. Each spot on the graph represents a haplotype. The y-axis indicates the fitness of each haplotype. The x-axis indicates the number of amino acid substitutions within these haplotypes. For example, consider a point in B, which represents a haplotype from the substitution combination in the TL of E. invadens IP1. If the point is positioned at x=3, showing it contains three substitutions, and at y=−0.5, indicating a fitness value of −0.5 for this haplotype.
Figure 4.
Figure 4.. The epistasis landscape provides a comprehensive view of primary and secondary deviation scores, emphasizing substitutions with notable epistatic effects.
A. Fitness of eight Entamoeba invadens IP1 TL single substitutions in the yeast Pol II background, and the expected and observed fitness and the primary deviation score are shown in the heatmap. B. The epistasis landscape of E. invadens IP1 TL substitutions. The heatmap illustrates the fitness and epistasis of all unique intermediate haplotypes coming from combinations of eight substitutions. Intermediate haplotypes are grouped by number of substitutions from 1 to 8. The fitness values are displayed in the upper panel and the epistasis, represented by primary and secondary deviation scores, is displayed in the lower panel. The colors of substitution names indicates their phenotype profiles as single mutants, GOF is in green, LOF is in blue, unclassified is in grey.
Figure 5.
Figure 5.. Correlations between deviation scores reflect specific residue interactions in E. invadens IP1 TL substitutions.
A. Correlations between secondary deviation scores of all eight substitutions (y-axis) and the primary deviation score (x-axis). Linear regression was applied to each comparison of secondary deviation scores against primary deviation scores to check the correlation. Substitutions with an R2 value exceeding 0.5 are annotated on the x-y plot, indicating their substantial impact on primary epistasis of the haplotypes. B-D. Correlations between secondary deviations of the other seven substitutions (y-axis) vs V1089T (B), S1096E (C), S1091E (D) on the x-axes respectively. E. The fitness landscape of intermediate combinations with fitnesses in the ultra-sick/lethal range. Their observed fitness levels are in the lethal range as indicated with black blocks in the heatmap while the expected fitness scores calculated from the additive model is in viable range (light blue blocks), indicating potential negative interactions. The fitness scores of the constituent single mutants within lethal haplotypes are shown in the heatmap. F1086H, V1089T, S1091E, and K1093T are present in each lethal haplotype while S1096E is absent. Names of substitutions are colored based on their phenotype profiles as single mutants. GOF: green. LOF: blue. No obvious phenotype: grey. F. The fitness of all ultra-sick to lethal haplotypes with S1096E incorporated is no longer in the ultra-sick/lethal range. G. Scheme of specific residue interactions within substitutions of E. invadens IP1 TL.
Figure 6.
Figure 6.. Intricate higher-order epistasis observed in substitutions of P. persalinus (Ciliate) TL haplotype.
A. The heatmap displays the fitness of nine single substitutions in P. persalinus (Ciliate) TL in the yeast Pol II background, along with the epistasis between them represented by the primary deviation score. B. Similar to Fig. 5A, we checked correlations between secondary deviation scores (y-axis) to the primary deviation score (x-axis) to identify substitutions with substantial impact on primary deviation scores. Simple linear regression was applied to each comparison. Substitutions with R2 > 0.5 are annotated in the plot. C. The fitness and deviation scores of substitution combinations related to group I are shown in the heatmap. Names of substitutions are colored based on their phenotypic profiles as single mutants. GOF: green. LOF: blue. No obvious phenotype (unclassified): grey. Each line shows the fitness and deviation scores of substitutions in a certain combination. Left, the fitness of individual substitutions, and the expected and observed fitness. Right, the primary deviations calculated by comparing observed and expected fitness and the secondary deviation scores of each constituent substitution. A1076T/A1090S/S1091D/S1096L/I1104L is in the first line. Its observed fitness is smaller than expected and when compared, resulting in a negative primary deviation score, representing a negative interaction. The secondary deviation scores of each constituent substitutions are all negative, indicating each of them showing negative interactions when adding to corresponding compounds. The following four lines represent the four combinations where V1089S, K1092R, K1093N, and K1102Q are incorporated into A1076T/A1090S/S1091D/S1096L/I1104L respectively. All observed fitness of combinations is healthier than A1076T/A1090S/S1091D/S1096L/I1104L, and the secondary deviation scores of V1089S, K1092R, K1093N and K1102Q are all positive, implying positive effect (suppression) on each combination respectively. D. Scheme illustrating the substitution interaction network observed in C. E. Similar to C, the fitness and deviation scores of combinations related with group II are shown in the heatmap. The first row shows the fitness and deviation scores detected within the combination A1076T/A1090S/S1091D/K1092R/K1093N. The following rows displays the corresponding fitness and deviation scores when the other four substitutions are incorporated. Notably, the effect of V1089S on A1076T/A1090S/S1091D/K1092R/K1093N cannot be determined because the observed fitness of the combination (V1089S + A1076T/A1090S/S1091D/K1092R/K1093N) is in the ultra-sick/lethal range, and its expected fitness calculated from the log additive model is also in the lethal range due to additivity. In this case, the secondary deviation of V1089S is represented by a black block in the heatmap. Moreover, the effect of A1076T on V1089S/A1090S/S1091D/K1092R/K1093N cannot be determined because the observed fitness falls within the lethal range. The expected fitness of (A1076T + V1089S/A1090S/S1091D/K1092R/K1093N) is also in the lethal range due to the presence of the lethal compound V1089S/A1090S/S1091D/K1092R/K1093N. The expected lethality of the combination is because it contains a lethal component. The secondary deviation score of A1076T cannot be determined either and is indicated by a dark gray block in the heatmap. Similarly, the scores of A1090S and S1091D could not be determined for the same reason with A1076T and are also marked with dark grey blocks. F. Scheme representing the substitution interaction networks observed in E.
Figure 7.
Figure 7.. Different classes of epistatic effects.
A. Histogram of mutants’ epistatic effects, represented by their respective maximum likelihood estimate σ2 of secondary deviation scores. Higher epistatic effect indicates greater impact of a certain substitution. B. Medians of secondary deviation scores of substitutions were plotted against their corresponding σ2. Substitutions are colored based on their phenotypes. C. Comparing epistatic effects of mutants in each category. Each scatter plot shows the measured fitness of haplotypes without (x-axis) versus with (y-axis) a substitution incorporated. The colors of the plots represent the mutants’ phenotypes. The colored line marks the simple linear regression of the spots, representing the observed epistatic effect of the substitution. R2 values of the regressions are labeled in the plots. The black line indicates the additive (non-epistatic) expectation.

References

    1. Bakerlee C. W., Nguyen Ba A. N., Shulgina Y., Rojas Echenique J. I. and Desai M. M., 2022. Idiosyncratic epistasis leads to global fitness-correlated trends. Science 376: 630–635. - PMC - PubMed
    1. Bank C., Hietpas R. T., Jensen J. D. and Bolon D. N., 2015. A systematic survey of an intragenic epistatic landscape. Mol Biol Evol 32: 229–238. - PMC - PubMed
    1. Bar-Nahum G., Epshtein V., Ruckenstein A. E., Rafikov R., Mustaev A. et al. , 2005. A ratchet mechanism of transcription elongation and its control. Cell 120: 183–193. - PubMed
    1. Barnes C. O., Calero M., Malik I., Graham B. W., Spahr H. et al. , 2015. Crystal Structure of a Transcribing RNA Polymerase II Complex Reveals a Complete Transcription Bubble. Mol Cell 59: 258–269. - PMC - PubMed
    1. Belogurov G. A., and Artsimovitch I., 2019. The Mechanisms of Substrate Selection, Catalysis, and Translocation by the Elongating RNA Polymerase. J Mol Biol 431: 3975–4006. - PMC - PubMed

Publication types