Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 11;50(1):561-578.
doi: 10.1093/nar/gkab1214.

Rational design and construction of multi-copy biomanufacturing islands in mammalian cells

Affiliations

Rational design and construction of multi-copy biomanufacturing islands in mammalian cells

Raffaele Altamura et al. Nucleic Acids Res. .

Abstract

Cell line development is a critical step in the establishment of a biopharmaceutical manufacturing process. Current protocols rely on random transgene integration and amplification. Due to considerable variability in transgene integration profiles, this workflow results in laborious screening campaigns before stable producers can be identified. Alternative approaches for transgene dosage increase and integration are therefore highly desirable. In this study, we present a novel strategy for the rapid design, construction, and genomic integration of engineered multiple-copy gene constructs consisting of up to 10 gene expression cassettes. Key to this strategy is the diversification, at the sequence level, of the individual gene cassettes without altering their protein products. We show a computational workflow for coding and regulatory sequence diversification and optimization followed by experimental assembly of up to nine gene copies and a sentinel reporter on a contiguous scaffold. Transient transfections in CHO cells indicates that protein expression increases with the gene copy number on the scaffold. Further, we stably integrate these cassettes into a pre-validated genomic locus. Altogether, our findings point to the feasibility of engineering a fully mapped multi-copy recombinant protein 'production island' in a mammalian cell line with greatly reduced screening effort, improved stability, and predictable product titers.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A novel strategy for cell line development. Using custom-built software, multiple coding sequence (CDS) variants encoding a protein of interest (P.O.I.) are obtained. These sequences are optimized for protein expression in a mammalian production host of choice, while their DNA sequences are designed to be highly divergent from each other. Together with libraries of regulatory components (insulators, promoters, 3′-UTRs), coding sequence variants are used to construct a number of expression cassettes, all expressing the P.O.I. but each unique in its genetic makeup. Multiple expression cassettes are subsequently assembled in a single step via homologous recombination in yeast. A string of concatenated expression cassettes is integrated, without any trailing vector sequences, into the host chromosome at a pre-validated site via recombinase-mediated cassette exchange, giving rise to a rationally engineered bioproduction island.
Figure 2.
Figure 2.
Coding sequence diversification and optimization. (A) Harmonization of genetic divergence. Sets of 20, 25, 30,…, 220 degenerate IFNg sequences were generated with the diversifier algorithm (orange diamonds) or by assigning random synonymous codons (red diamonds, at every codon position, synonymous codons are chosen with equal probability and therefore tend to be used with similar frequencies). For every set of sequences, the longest continuous stretch of homology between any two sequences in the set is plotted against the size of the set. (B) The effect of tuning RSCUmin on CAI. Three sets of 200 IFNg sequences were generated in silico with the diversifier algorithm, with the RSCUmin paramter set to 0.0, 0.5 or 0.8, thus restricting the available codon space to those codons with an RSCU higher than the threshold value (a value of 0.0 means that all codons are accepted) The codon adaptation index (CAI) for every sequence was computed and CAI values in each of the three IFNg sets were binned to construct the histograms shown in the figure. (C) 5′-sequence re-design with mRNA Optimizer. 200 IFNg degenerate sequences were constructed with the diversification algorithm. Sequence regions from -23 to + 18 around the ATG start site were fed to the optimizer and the codons following the start site were optimized so as to increase mRNA folding energies. Note that sequences located 5′ of the start site (containing the Kozak region and restriction sites for cloning) are not modified by the optimizer. The distributions of Minimum Folding Energies (MFE) before (not optimized) and after optimization (optimized) are shown as histograms.
Figure 3.
Figure 3.
Structure of the purposely-built software used to obtain multiple gene coding sequence (CDS) variants from the amino acid sequence of a protein-of-interest. The software consists of two interconnected modules for sequence optimization and diversification. Through the sequence optimization module, the user is able to select a mammalian expression host (CHO, human or mouse) and set codon usage thresholds (RSCU_min, RSCU_min_AT), thus boosting the output sequences’ codon adaptation index (CAI). Through the diversification routines, sequence variation between CDSs is maximized and homogenized so as to avoid long stretches of homology between sequences. Optionally, DNA sequence composition around the start site can be optimized using third-party software (mRNA Optimizer) so as to minimize the mRNA folding propensity in this region. Output CDSs thus designed are ready for synthesis and downstream applications.
Figure 4.
Figure 4.
Experimental testing of coding and regulatory elements. (A) The relationship between codon adaptation index (CAI) and mCitrine expression. Fluorescence values for libraries c1.x (n = 37) and c2.x (n = 10) are normalized to cOpt (obtained with the ‘one amino acid-one codon’ optimization rule, i.e. only the most frequent codon for every amino acid is used) and are plotted against sequence CAI. The best-fit line is shown. (B) IFNg expression from 21 synthetic coding sequences designed with the gene diversifier software tool. The black bar shows expression from the wild type human IFNg coding sequence. (C) Heat map showing the hamming distance (h.d.) matrix for the IFNg library, where the color of each square represents the extent of sequence divergence between two sequences in the library. (D) Map showing the wild type human EF1a promoter sequence regions that were preserved across our synthetic promoter library. The length of such regions is indicated at the bottom. TATA = TATA box; Inr = Initiator element; 5′ ss = 5′ splice site; 3′ ss = 3′ splice site. (E) Reporter (mCitrine) expression driven by synthetic promoters p1.hEF1a-p21.hEF1a as well as the wild type hEF1a promoter sequence (black bar). mCitrine expression, evaluated after transient expression in CHO cells, was normalized to a transfection control (mCherry). (F) A library of 3′ UTRs (91 bp in length) was built by sequence randomization around the RbG poly(A) functional elements. The 3′ UTR library was cloned downstream of an mCitrine fluorescent reporter and transfected in CHO cells together with an mCherry reporter plasmid (used as a transfection control). The twenty strongest library members are shown in the bar chart. (The black bar indicates wild type RbG 3′-UTR performance.)
Figure 5.
Figure 5.
Assembly of multiple gene constructs from individual coding and regulatory elements. (A) First Level vector design scheme. Each expression cassette bears an insulator, a promoter, a 3′ UTR and poly(A) signal, and an adapter required for the assembly of multiple cassettes. Cassette components are punctuated by unique restriction enzyme sites, thus allowing for facile component replacement. (B) The homology between insulators (Ins) and Adapters (A) guides the assembly of individual gene cassettes into larger multi-gene constructs via homologous recombination in yeast. (C) Four cassettes, identical in their hEF1a-mCitrine-RbGpA sequences but with unique, compatible overlaps (see cartoons on the left) were transformed with linearized pRG216 shuttle vector into S. cerevisiae. Twelve transformant colonies (3 shown) were analyzed and all were found to contain a single cassette insertion (∼3 kb) between left and right vector homology arms (L.H.A, R.H.A) (see cartoon on the right and agarose gel image). M: 1kb molecular marker (ThermoFisher), Lanes 1–3: plasmid prepared from E. coli tranformants and digested with the double-cutter SbfI. Vector b.b. = vector backbone. (D) 4-Cassette assembly in vector pRG216 using unique coding and regulatory elements. Agarose gel showing a restriction digestion pattern analysis for 10 candidate assemblies. Lanes with asterisk indicate the correct digestion pattern. M: 1kb molecular marker (ThermoFisher), (E, F) agarose gels showing restriction digestion screening results allowing for the identification of plasmids that have correctly assembled 4 (panel E) and 10 (panel F) IFNg gene cassettes together with linearized BAC shuttle vector backbone (b.b.) pYES1L. Lanes are numbered and asterisks indicate lanes where the correct restriction patterns are observed. M = molecular marker, 1 kb extend (NEB). (G) Junction PCR analysis indicating the presence of all eight fragment junction amplicons (j1−j8) in a 7-cassette plus backbone assembly. PCR reactions are multiplexed (2 PCR reaction mixes with four PCR reactions per mix). M: 100bp GeneRuler (Thermo Scientific).
Figure 6.
Figure 6.
Impact of gene copy number on protein expression in transient transfections. (A) Cartoon showing single- and multi-copy constructs designed for assessing the gene dosage-protein expression relationship. Constructs harbor a variable number of gene coding sequences (gray boxes), while all carry the same mCherry expression cassette (the mCherry coding sequence is shown in red), required to normalize protein expression levels. (B) The decrease in transfection efficiency as plasmid size increases over the range 12–27 kb is shown. Equimolar amounts of pYES1L-mCherry-mCitrine.1x, 3x, 6x, 9x were transfected into CHO cells and percentages of transfected cells were computed as the number of mCherry positive cells upon flow cytometry analysis. (C) Bar chart showing the impact of increasing mCitrine copy number on mCitrine expression. Multi-copy constructs containing a variable number of genetically unique mCitrine expression cassettes (1, 3, 6 or 9) and an mCherry expression cassette were transfected into CHO cells. mCitrine/mCherry ratios are reported in the bar chart. (D) Flow cytometry plot showing the shift in mCitrine expression when comparing a single expression cassette (cit.1x) against a multiple gene construct harboring nine gene expression units (cit.9x). (E) Expected mCitrine expression values for the four multi-copy constructs (circles) were calculated by adding up expression values from individual mCitrine cassettes. Observed fluorescence values for the pYES1L-mCherry-mCitrine.x series (diamonds), normalized to pYES1L-mCherry-mCitrine.1x, are also shown. The expected (dashed line) and observed (solid line) gene dosage-mCitrine expression trajectories (best-fit lines) are reported. (F) Bar chart showing IFNg production levels as the number of IFNg gene copies increases from one to nine. IFNg secretion levels were normalized by mCherry expression levels.
Figure 7.
Figure 7.
Stable integration of multi-copy constructs into the CHO genome. (A) Schematic diagram of the landing pad (LP) and donor plasmid structure (left side of the panel). Two attachment sites (attP and its reverse complement, attP’, represented by the gray rectangles) were placed on our landing pad which was integrated into the host chromosome. Donor plasmids are designed with two donor sites facing each other, attB and attB’ (black rectangles). When acceptor and donor sites react in the presence of Bxb1, recombination takes place, resulting in targeted exchange and the integration of donor material in either the forward or reverse orientation relative to the landing pad on the genome (right side of the panel). In the diagram, landing pad marker (mCherry), sentinel marker (mCerulean) and production genes (mCitrine genes) are shown with the red, blue and yellow block arrows, respectively. Rectangles with curved arrows represent promoters (3′-UTRs and insulators not shown). (B) RMCE experimental design. Flow cytometry diagrams (10 days after transfection of LP lines with donor and recombinase) showing that RMCE requires both donor plasmid DNA and Bxb1 Int (only two LPs are shown here). Conversion of the LP marker (mCherry) to the donor marker (mCerulean) signals a successful genome targeting event. Every RMCE experiment always included a control where no Bxb1 recombinase was added, to confirm that marker conversion was specifically attributable to Bxb1-mediated cassette exchange. (C) Bar chart showing average mCitrine fluorescence values of mCerulean(+)/mCherry(-)/mCitrine(+) recombinant populations after targeting of LP11 with the pYES1L-attB/B’-mCerulean-mCitrine1x-9x vector series. Across all recombinant lines, only the largest active subpopulations—that are also characterized by the highest level of mCitrine expression—were considered for this analysis. (D) Flow cytometry histograms showing the shift in mCitrine expression as a function of mCitrine copy number.

References

    1. Gronemeyer P., Ditz R., Strube J.. Trends in upstream and downstream process development for antibody manufacturing. Bioengineering. 2014; 1:188–212. - PubMed
    1. Walsh G. Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 2018; 36:1136–1145. - PubMed
    1. Wurm F.M. Production of recombinant protein therapeutics in cultivated mammalian cells. Nat. Biotechnol. 2004; 22:1393–1398. - PubMed
    1. Fan L., Kadura I., Krebs L.E., Hatfield C.C., Shaw M.M., Frye C.C.. Improving the efficiency of CHO cell line generation using glutamine synthetase gene knockout cells. Biotechnol. Bioeng. 2012; 109:1007–1015. - PubMed
    1. Cacciatore J.J., Chasin L.A., Leonard E.F.. Gene amplification and vector engineering to achieve rapid and high-level therapeutic protein production using the Dhfr-based CHO cell selection system. Biotechnol. Adv. 2010; 28:673–681. - PubMed

Publication types

Substances