Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May;19(5):886-96.
doi: 10.1101/gr.089391.108.

Proteomic discovery of previously unannotated, rapidly evolving seminal fluid genes in Drosophila

Affiliations

Proteomic discovery of previously unannotated, rapidly evolving seminal fluid genes in Drosophila

Geoffrey D Findlay et al. Genome Res. 2009 May.

Abstract

As genomic sequences become easier to acquire, shotgun proteomics will play an increasingly important role in genome annotation. With proteomics, researchers can confirm and revise existing genome annotations and discover completely new genes. Proteomic-based de novo gene discovery should be especially useful for sets of genes with characteristics that make them difficult to predict with gene-finding algorithms. Here, we report the proteomic discovery of 19 previously unannotated genes encoding seminal fluid proteins (Sfps) that are transferred from males to females during mating in Drosophila. Using bioinformatics, we detected putative orthologs of these genes, as well as 19 others detected by the same method in a previous study, across several related species. Gene expression analysis revealed that nearly all predicted orthologs are transcribed and that most are expressed in a male-specific or male-biased manner. We suggest several reasons why these genes escaped computational prediction. Like annotated Sfps, many of these new proteins show a pattern of adaptive evolution, consistent with their potential role in influencing male sperm competitive ability. However, in contrast to annotated Sfps, these new genes are shorter, have a higher rate of nonsynonymous substitution, and have a markedly lower GC content in coding regions. Our data demonstrate the utility of applying proteomic gene discovery methods to a specific biological process and provide a more complete picture of the molecules that are critical to reproductive success in Drosophila.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Expression analysis for new Sfps and their bioinformatically identified orthologs. RT-PCR was used to assay for expression of two Sfp genes, Sfp53E (top) and Sfp26Ad (middle), as well as a housekeeping control gene, RpL32 (bottom), which is expressed ubiquitously. Sfp53E is expressed exclusively in males of all three species assayed; female D. melanogaster and D. simulans show amplification of a larger than expected product, which could represent unspliced gene product. Sfp26Ad expression levels appeared variable between males of each species, and the pattern of expression is male-specific in D. simulans and D. yakuba and male-biased in D. melanogaster. Note that the D. yakuba Sfp53E and the D. melanogaster Sfp26Ad products are larger than the products from their orthologs in other species. This difference was caused by the use of distinct primer binding sites in each species; comparisons to the expected product sizes confirmed that proper splicing occurred in each of these two cases. Each image contains two ladder lanes with a 100-bp ladder; the smallest band in each lane is 100 bp. For additional RT-PCR data, see Supplemental Figure S2 and Supplemental Table S7.
Figure 2.
Figure 2.
Whole-gene, pairwise estimates of dN and dS values for 27 Sfps discovered in D. melanogaster and compared with orthologs in D. simulans. The solid line indicates ω = 1; the dashed line indicates ω = 0.5. All genes were also tested across additional species for specific sites evolving under positive selection (Supplemental Table S8).
Figure 3.
Figure 3.
Structural model of the predicted C-type lectin SFP24F. The D. melanogaster SFP24F protein structure was predicted by PHYRE by threading the sequence onto a C-type lectin from mouse (PDB ID no. 2ox9). Sites indicated by space-filled molecules were predicted to have evolved under positive selection (dark blue sites, codeml model M8 BEB P > 0.95, light blue sites, P > 0.90). As oriented in the figure, the carbohydrate recognition domain is located at the bottom of the protein.
Figure 4.
Figure 4.
Phylogenetic analysis of SFP38D and its tandem duplicates. Phylogenetic tree of coding DNA sequences for SFP38D, its orthologs, and its paralogs from D. melanogaster (CG17472 and CG31680) and additional species. Branch color indicates the estimated ω rate for each branch. Red color indicates branches that are predicted to have experienced positive selection. Values of ω for red branches are, from top to bottom: 1.33, ∞, 2.02, ∞. Numbers under each node indicate percentage of bootstrap support for the phylogeny based on 1000 replicates. Sequences for SFP38D and its orthologs were obtained from BLAST and BLAT searches; other sequences represent gene models (numbered as indicated for each species) available from FlyBase. Note that though branches are colored so as to indicate which range of ω values they fell into, the model estimated a precise value for each branch.
Figure 5.
Figure 5.
Comparisons of annotated and unannotated transferred Sfps in D. melanogaster. (A) Mean values for annotated and unannotated genes for whole-gene, pairwise estimates of dN. Unannotated genes had a significantly higher rate of nonsynonymous substitution. Error bars, 1 SEM. (B) GC content for annotated and unannotated genes. G+Cc indicates overall GC content in coding regions; G+C2, GC content in second codon positions; and G+C3, GC content in third codon positions. For both panels, gray bars indicate annotated Sfps, and black bars indicate unannotated Sfps.

References

    1. Akashi H. Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila DNA. Genetics. 1995;139:1067–1076. - PMC - PubMed
    1. Ansong C., Purvine S.O., Adkins J.N., Lipton M.S., Smith R.D. Proteogenomics: Needs and roles to be filled by proteomics in genome annotation. Brief. Funct. Genomics Proteomics. 2008;7:50–62. - PubMed
    1. Baerenfaller K., Grossmann J., Grobei M.A., Hull R., Hirsch-Hoffmann M., Yalovsky S., Zimmermann P., Grossniklaus U., Gruissem W., Baginsky S. Genome-scale proteomics reveals Arabidopsis thaliana gene models and protein dynamics. Science. 2008;320:938–941. - PubMed
    1. Begun D.J., Lindfors H.A. Rapid evolution of genomic Acp complement in the melanogaster subgroup of Drosophila. Mol. Biol. Evol. 2005;22:2010–2021. - PubMed
    1. Begun D.J., Whitley P., Todd B.L., Waldrip-Dail H.M., Clark A.G. Molecular population genetics of male accessory gland proteins in Drosophila. Genetics. 2000;156:1879–1888. - PMC - PubMed

Publication types

Substances