Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 15;12 Suppl 1(Suppl 1):S5.
doi: 10.1186/1471-2164-12-S1-S5.

Filtering "genic" open reading frames from genomic DNA samples for advanced annotation

Affiliations

Filtering "genic" open reading frames from genomic DNA samples for advanced annotation

Sara D'Angelo et al. BMC Genomics. .

Abstract

Background: In order to carry out experimental gene annotation, DNA encoding open reading frames (ORFs) derived from real genes (termed "genic") in the correct frame is required. When genes are correctly assigned, isolation of genic DNA for functional annotation can be carried out by PCR. However, not all genes are correctly assigned, and even when correctly assigned, gene products are often incorrectly folded when expressed in heterologous hosts. This is a problem that can sometimes be overcome by the expression of protein fragments encoding domains, rather than full-length proteins. One possible method to isolate DNA encoding such domains would to "filter" complex DNA (cDNA libraries, genomic and metagenomic DNA) for gene fragments that confer a selectable phenotype relying on correct folding, with all such domains present in a complex DNA sample, termed the "domainome".

Results: In this paper we discuss the preparation of diverse genic ORF libraries from randomly fragmented genomic DNA using ß-lactamase to filter out the open reading frames. By cloning DNA fragments between leader sequences and the mature ß-lactamase gene, colonies can be selected for resistance to ampicillin, conferred by correct folding of the lactamase gene. Our experiments demonstrate that the majority of surviving colonies contain genic open reading frames, suggesting that ß-lactamase is acting as a selectable folding reporter. Furthermore, different leaders (Sec, TAT and SRP), normally translocating different protein classes, filter different genic fragment subsets, indicating that their use increases the fraction of the "domainone" that is accessible.

Conclusions: The availability of ORF libraries, obtained with the filtering method described here, combined with screening methods such as phage display and protein-protein interaction studies, or with protein structure determination projects, can lead to the identification and structural determination of functional genic ORFs. ORF libraries represent, moreover, a useful tool to proceed towards high-throughput functional annotation of newly sequenced genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ORF filtering vector. The filtering vectors features are shown in panel A. Blunt ended gDNA random fragments are cloned between a leader sequence (Sec, SRP, or TAT) and the mature β-lactamase gene. C-terminal SV5 and His tags are used for detection and purification respectively. In panel B, the effect of selective pressure on ampicillin is shown for ORFs and non ORFs.
Figure 2
Figure 2
Filtering of Clostridium thermocellum gDNA libraries. Survival rates of the three (Sec, SRP, TAT) genomic DNA fragmented libraries plated on increasing ampicillin concentrations are shown. Data are normalized according to the total number of clones growing on Chloramphenicol (Amp 0) plates, with no filtering pressure, and indicated as percentages.
Figure 3
Figure 3
β-lactamase assay on non filtered and filtered libraries. In the chart, the mean activity value of 45-48 clones for each ampicillin concentration in 3 replicate plates is represented. Data collected at 6 h incubation time-point for Sec, SRP, and TAT libraries are shown.
Figure 4
Figure 4
454 sequencing analysis of filtered libraries. Sequencing data for the 3 libraries Sec, SRP, TAT) are shown in the table. The Venn diagram shows the number of different genes shared between the libraries.
Figure 5
Figure 5
Sequence distribution along C. thermocellum genome. Panel A shows the distribution frequencies of 454 sequences along C. thermocellum genome in 40000 nucleotides windows. Panel B shows the distribution of Sec filtered libraries compared with the distribution of the perfect match (pm) sub set of sequences from the same library. Panel C shows the 454 data analysis for perfect match sequences in Sec library.
Figure 6
Figure 6
Filtered fragments distribution on single genes. The distribution and the variable length of Sec filtered clones mapping on Cthe_2819 gene are shown. The mapping reads were obtained from the raw 454 sequencing dataset.

Similar articles

Cited by

References

    1. Moszer I. The complete genome of Bacillus subtilis: from sequence annotation to data management and analysis. FEBS Lett. 1998;430(1-2):28–36. doi: 10.1016/S0014-5793(98)00620-6. - DOI - PubMed
    1. Stothard P, Wishart DS. Automated bacterial genome analysis and annotation. Curr Opin Microbiol. 2006;9(5):505–510. doi: 10.1016/j.mib.2006.08.002. - DOI - PubMed
    1. Jones CE, Brown AL, Baumann U. Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics. 2007;8:170. doi: 10.1186/1471-2105-8-170. - DOI - PMC - PubMed
    1. Andorf C, Dobbs D, Honavar V. Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach. BMC Bioinformatics. 2007;8:284. doi: 10.1186/1471-2105-8-284. - DOI - PMC - PubMed
    1. Wong WC, Maurer-Stroh S, Eisenhaber F. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol. 2010;6(7):e1000867. doi: 10.1371/journal.pcbi.1000867. - DOI - PMC - PubMed

Publication types

LinkOut - more resources