Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct;15(10):2356-65.
doi: 10.1110/ps.062082606.

Combinatorial Domain Hunting: An effective approach for the identification of soluble protein domains adaptable to high-throughput applications

Affiliations

Combinatorial Domain Hunting: An effective approach for the identification of soluble protein domains adaptable to high-throughput applications

Stefanie Reich et al. Protein Sci. 2006 Oct.

Abstract

Exploitation of potential new targets for drug and vaccine development has an absolute requirement for multimilligram quantities of soluble protein. While recombinant expression of full-length proteins is frequently problematic, high-yield soluble expression of functional subconstructs is an effective alternative, so long as appropriate termini can be identified. Bioinformatics localizes domains, but doesn't predict boundaries with sufficient accuracy, so that subconstructs are typically found by trial and error. Combinatorial Domain Hunting (CDH) is a technology for discovering soluble, highly expressed constructs of target proteins. CDH combines unbiased, finely sampled gene-fragment libraries, with a screening protocol that provides "holistic" readout of solubility and yield for thousands of protein fragments. CDH is free of the "passenger solubilization" and out-of-frame translational start artifacts of fusion-protein systems, and hits are ready for scale-up expression. As a proof of principle, we applied CDH to p85alpha, successfully identifying soluble and highly expressed constructs encapsulating all the known globular domains, and immediately suitable for downstream applications.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Gene fragmentation. (A) Schematic of the CDH gene fragmentation process. PCR with TTP/dUTP mixtures is used to generate copies of the target gene in which uracil is randomly incorporated in place of thymine. The uracil-doped amplified DNA is subjected to a modified base-excision cascade in which uracil-DNA glycosylase excises the uracil bases generating abasic sites, which are cleaved by endonuclease IV, giving a single-strand nick that is converted to a double-strand break and blunt-ended by S1 nuclease. As the reaction cascade is initiated only at uracils, whose distribution along the sequence and among the PCR reaction products is random, the cascade generates a random and unbiased library of gene fragments, whose size distribution is solely dictated by the TTP/dUTP ratio. (B) dUTP-dose dependent fragmentation. SYBR-Safe stained 1% agarose gel of an ∼2.2-kb human p85α PCR-amplified cDNA (right-hand lane), alongside the products of CDH fragmentation reactions using increasing amounts of dUTP (as percent of total TTP+dUTP concentration). The progressive decrease in modal size of the DNA distribution with increasing dUTP concentration is clearly seen.
Figure 2.
Figure 2.
Fragment library distribution. (A) Fragment size distribution is unbiased. SYBR-Safe stained 1% agarose gel of 144 individual clones, generated by shotgun capture of the fragmentation reaction in the ligase-free cloning vector pCR-Blunt-II TOPO (Invitrogen). Clones were pooled in lots of 12 and miniprepped, and captured DNA inserts were released as EcoRI fragments, with 12 vector-derived bases still attached to each end. The distribution of fragment sizes populates the desired range 0.1–1.0 kb. (B) The fragment position is random. Coverage plot of 63 randomly selected and sequenced clones (black lines) from the p85α fragment library, ordered according to their 5′-end (bottom to top), arrayed against the 2175-bp sequence of human p85α. Apart from clones beginning at the actual 5′-end of the target gene, the start positions of the fragments are evenly distributed across the target gene, which is fully sampled. Although the sample size is far too small for statistical significance, it is fully consistent with random and unbiased fragmentation. (C) As B, but with the data sorted by 3′-end position. (D) Histogram of fragment size frequency (N). Fragment sizes are binned in intervals of 200 bp. Although the sample size is too small for statistical significance, the distribution is consistent with the expected Poisson distribution for a random fragmentation process.
Figure 3.
Figure 3.
Solubility screening. (A) Schematic of CDH screening process. Colonies arrayed on membranes that react with an anti-tag antibody are picked and individually inoculated into small-scale liquid cultures in multiwell dishes, incubated under standard conditions and expression-induced. Gentle nondenaturing lysis releases cytoplasmic proteins that pass through a hydrophobic filter, and over an affinity resin for the attached tag. Eluates from the affinity resin are blotted onto membranes and detected using anti-tag antibody. Tagged protein that is abundant in the cytoplasm, soluble and nonaggregated, and properly folded is substantially enriched by this process and gives rise to strong signals in the dot blot. (B) Principle of “tag-availability.” When a peptide tag is appended to the C terminus of a hypothetical target protein construct that encapsulates a folded globular region (left), the tag (magenta surface) is fully exposed and available for interaction with affinity resins. When the construct is too short (right), the tag becomes embroiled in the core of the protein and is unavailable to affinity resins. Even where a tag is appropriately positioned relative to the domain termini, aggregation and misfolding decrease the availability of the tag favoring retention of “good” constructs over “bad.”
Figure 4.
Figure 4.
p85α “hits.” (A) Dot blots for eight clones that were taken through to preliminary structural assessment by 1H-NMR. Pre-screen blots indicate reactive protein levels prior to any filtration or affinity enrichment that is sensitive to tag-availability. Post-screen blots indicate levels of folded, soluble, nonaggregated protein. A decrease in signal (as in A014-F07) suggests that this construct expresses at high levels, but is not as efficiently released from the cytoplasm as other constructs. Nonetheless, it produces sufficient protein for structural studies. (B) Western blots of SDS-PAGE gel of protein eluted from the final stage of the screen for eight clones taken through to preliminary structural assessment by 1H-NMR. Consistent with the dot blots, all samples show bands that are immunoreactive to anti-tag antibody, and indicate that “hits” are in the expected size range for the experiment. One clone (A010-A05) shows clear evidence of proteolytic breakdown from the genetically predicted protein size. (C) As B but Coomassie-stained to indicate total protein. All clones show good correlation between the immunoreactive bands in the Western blots (B) and the major protein bands in this gel. In most cases, the level of the target band is substantially higher than other protein bands and should readily purify with one or two more steps to a suitable degree for detailed structural analysis by NMR or X-ray crystallography.
Figure 5.
Figure 5.
1H-NMR spectra of p85α “hits.” (A) 1H-NMR spectrum of protein expressed by clone A016-E02 (corresponding to the N-SH2 domain). Clear resonances below 0 ppm arise from upfield-shifted methyl groups, which are strongly indicative of globular structure. Tildes (~) represent signals truncated for clarity, and asterisks (*) indicate sharp signals from buffer components. (B) Upfield methyl group region from clone A010-B08, corresponding to the tandem SH3-BCR domain pair. (C) As B, but for clone A014-F07 corresponding to the BCR domain. (D) As B, but for clone A010-A05 corresponding to the C-SH2 domain. (E) As B, but for clone A004-G10 corresponding to a short segment of polypeptide containing the low-sequence complexity linker between the BCR and N-SH2 domains and a fragment of the N-SH2 domain. The absence of upfield-shifted signals with chemical shifts <0.8 ppm indicates a nonglobular piece of protein representing a rare false positive from CDH.
Figure 6.
Figure 6.
Coverage of p85α CDH “hits.” Bars indicate the positions of the 14 final “hits” relative to the p85α protein sequence and known domain structure. These 14 clones gave pre- and post-screen dot blots significantly above background, gave immunoreactive bands in Western blots that correlated with strong protein bands in Coomassie-stained gels, and produced good levels of protein in one simple scale-up from the small-scale parallel growth conditions used in the screens. Constructs in blue have been shown by NMR to encode folded globular protein; those in magenta also give NMR spectra consistent with folded globular structures, but display some proteolysis in gels, suggesting that they contain poorly ordered but nonaggregating termini attached to a folded core. Constructs in gray have not been further characterized. Nearly all of the constructs shown (except the one unfolded construct, red) would be suitable for structural studies of p85α component domains and/or screening assays for small-molecule ligands.

Similar articles

Cited by

References

    1. Blundell, T.L., Jhoti, H., Abell, C. 2002. High-throughput crystallography for lead discovery in drug design. Nat. Rev. Drug Discov. 1: 45–54. - PubMed
    1. Booker, G.W., Breeze, A.L., Downing, A.K., Panayotou, G., Gout, I., Waterfield, M.D., Campbell, I.D. 1992. Structure of an SH2 domain of the p85 α subunit of phosphatidylinositol-3-OH kinase. Nature 358: 684–687. - PubMed
    1. Booker, G.W., Gout, I., Downing, A.K., Driscoll, P.C., Boyd, J., Waterfield, M.D., Campbell, I.D. 1993. Solution structure and ligand-binding site of the SH3 domain of the p85 α subunit of phosphatidylinositol 3-kinase. Cell 73: 813–822. - PubMed
    1. Cabantous, S., Pedelacq, J.D., Mark, B.L., Naranjo, C., Terwilliger, T.C., Waldo, G.S. 2005a. Recent advances in GFP folding reporter and split-GFP solubility reporter technologies. Application to improving the folding and solubility of recalcitrant proteins from Mycobacterium tuberculosis . J. Struct. Funct. Genomics 6: 113–119. - PubMed
    1. Cabantous, S., Terwilliger, T.C., Waldo, G.S. 2005b. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23: 102–107. - PubMed

Publication types

MeSH terms