Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 27;7(1):e1001284.
doi: 10.1371/journal.pgen.1001284.

Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes

Affiliations

Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes

Todd J Treangen et al. PLoS Genet. .

Abstract

Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Histogram of the normalized size of gene families.
For each family we compute the number of genes in the family and subtract it by the number of genomes containing at least one member of the family.
Figure 2
Figure 2. Relative contribution of horizontal gene transfer in protein family expansions.
Figure 3
Figure 3. Abundance of IS and prophages and increased inference of IGD events when included in analysis.
The bar plot (left y-axis) shows the percentage of gene family expansions of IS and phage origin. The line plot (right y-axis) indicates the increase of the number of expansions assigned to duplications when the co-localization criterion is ignored and IS and prophages are included in the dataset.
Figure 4
Figure 4. Gene expression differs according to gene origin.
Paralogs are more expressed, as measured by the codon adaptation index, than xenologs. Xenologs, however, are more expressed than the genes without paralogs and xenologs.
Figure 5
Figure 5. Evolutionary rates differ between paralogs and xenologs.
Non-synonymous (dN) and synonymous (dS) substitution rates in paralogs (blue; dashed linear fit) and xenologs (red; solid linear fit) in all clades computed using Codeml from PAML (model = 1, fix_omega = 0).
Figure 6
Figure 6. Protein family construction pipeline.
Starting with a databank of proteins, we first performed all pairwise similarity searches using BLASTP. The hits were filtered regarding the length of the match (70% of the length of the query) and the bitscore (30% of the maximal bitscore calculated by aligning a protein against itself). To build the gene families we ran MCL blastline and then removed all singletons, IS and Phage. To build the core genome we used OrthoMCL along with a synteny filter based on M-GCAT Clusters. Finally, using presence/absence and phylogenetic information, we obtained the protein families with expansions
Figure 7
Figure 7. Cumulative distribution function plot of protein similarity.
Colored lines correspond to CDF plots of the similarity between orthologous proteins of the core genome for the comparison of E. coli K12 W3110 with genomes of increasing phylogenetic distances. The gray line corresponds to the similarity between homologous genes in the E. coli K12 W3110 genome.

References

    1. McCutcheon JP, McDonald BR, Moran NA. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 2009;5:e1000565. doi: 10.1371/journal.pgen.1000565. - DOI - PMC - PubMed
    1. Schneiker S, Perlova O, Kaiser O, Gerth K, Alici A, et al. Complete genome sequence of the myxobacterium Sorangium cellulosum. Nat Biotechnol. 2007;25:1281–1289. - PubMed
    1. Pasek S, Risler JL, Brezellec P. The role of domain redundancy in genetic robustness against null mutations. J Mol Biol. 2006;362:184–191. - PubMed
    1. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 2007;8:R51. - PMC - PubMed
    1. Wagner A. Gene duplications, robustness and evolutionary innovations. Bioessays. 2008;30:367–373. - PubMed

Publication types