Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 19;6(7):1790-805.
doi: 10.1093/gbe/evu131.

Extensive local gene duplication and functional divergence among paralogs in Atlantic salmon

Affiliations

Extensive local gene duplication and functional divergence among paralogs in Atlantic salmon

Ian A Warren et al. Genome Biol Evol. .

Abstract

Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle.

Keywords: Atlantic salmon; gene duplication; genome evolution; transcriptome; whole-genome duplication.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
A flow diagram for the method to identify paralog gene pairs and determine whether they originated from a WGD or a LGD event.
F<sc>ig</sc>. 2.—
Fig. 2.—
Database comparisons. Each assembly was compared with two databases: (A) Using BLASTx, sequences were compared with the CEG database (Parra et al. 2007), which represent a core set of 248 genes expected to be present in all vertebrates at low paralog number (Parra et al. 2007). The proportions of the CEG sequences which were retrieved are given on the y axis, at a range of E-value thresholds (x axis). (B) Using BLASTn, sequences were compared with 9,057 full-length Salmo salar genes (Leong et al. 2010). Alignment length, given as a proportion of the full-length sequence is given on the x axis. The proportion of query sequences above the length on the x axis is given on the y axis. The data are from analyses performed using an E-value threshold of e20. Tests were carried out at a range of thresholds and the trend was very similar (see supplementary fig. S2.1, Supplementary Material online).
F<sc>ig</sc>. 3.—
Fig. 3.—
Genes (isogroups) expressed within and among the three tissues, detected using a reciprocal BLASTn search (E-value threshold = e−5). Only the longest isotig from each isogroup was used. Percentage of isogroups in each assembly is shown. B, brain; O, olfactory epithelium; P, pituitary.
F<sc>ig</sc>. 4.—
Fig. 4.—
Congruence of genes expressed in each transcriptome assembly with existing genomic resources. (A) Salmo salar Unigene database (Pontius and Schuler 2003), (B) GenBank mRNA database for salmonids (Benson et al. 2006), and (C) NCBI protein database for salmonids (Benson et al. 2006). The percentage of isogroups with significant BLAST hits is given on the y axis, and the E-value threshold is given on the x-axis.
F<sc>ig</sc>. 5.—
Fig. 5.—
Similarity between paralog sequences in Salmo salar expressed in the transcriptomes presented here (minimum percent ID = 80%, minimum alignment length = 300 bp). (A, B) All 2,394 putative paralog pairs. (C, D) Paralog pairs where both sequences were assigned to the same chromosome (LGDs). (E, F) Paralog pairs where the sequences were assigned to the different chromosomes (WGDs). Both percent ID within paralog pairs (A, C, E) and synonymous substitution (Ks) rate (B, D, E) are given. For ease of presentation, Ks values greater than 1.5 are not shown (see supplementary figs. S5.1–S5.8, Supplementary Material online). The analyses were repeated at a range of thresholds (supplementary figs. S5.1–S5.8, Supplementary Material online).
F<sc>ig</sc>. 6.—
Fig. 6.—
Chromosomal locations of putative paralog pairs compared with previously identified pairs of chromosomes with regions of homology (Phillips et al. 2009; Lien et al. 2011). Chromsome numbers are given on the x axis separated by an underscore.
F<sc>ig</sc>. 7.—
Fig. 7.—
Ka/Ks estimations within paralog pairs. Ka/Ks estimations above 2.0 (n = 17), are not shown (maximum = 8.32). See supplementary figure S5.9, Supplementary Material online.
F<sc>ig</sc>. 8.—
Fig. 8.—
Analyses of functional changes in paralog pairs. (A) BLAST hits for the two control data sets (C1: “same genes” and C2: “different genes”) and the real data set of paralog pairs (identified in WGD and LGD Events in S. salar), When compared against an NCBI salmonid protein database. The proportion of pairs is given on the y axis. In the majority of sequence pairs, neither sequence had a positive BLASTx hit and they are not shown here (see fig. 4C). This analysis was repeated with the salmonid mRNA from GenBank, the S. salar Unigene database, and the UniProt protein database, and the same results were obtained (supplementary fig. S6.1, Supplementary Material online). (B) The absolute difference in coding potential for each sequence pair across the three data sets (represented by absolute difference within paralog pairs in their respective PORTRAIT scores) (Arrial et al. 2009).

Similar articles

Cited by

References

    1. Alexandrou MA, Swartz BA, Matzke NJ, Oakley TH. Molecular phylogenetics and evolution genome duplication and multiple evolutionary origins of complex migratory behavior in Salmonidae. Mol Phylogenet Evol. 2013;69:514–523. - PubMed
    1. Allendorf FW, Thorgaard GH. Tetraploidy and the evolution of salmonid fishes. In: Turner BJ, editor. Evolutionary genetics of fishes. New York: Plenum Press; 1984. pp. 55–93.
    1. Arrial RT, Togawa RC, Brigido MDM. Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis. BMC Bioinformatics. 2009;10:239. - PMC - PubMed
    1. Bailey GS, Poulter RT, Stockwell PA. Gene duplication in tetraploid fish: model for gene silencing at unlinked duplicated loci. Proc Natl Acad Sci U S A. 1978;75:5575–5579. - PMC - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2006;34:D16–D20. - PMC - PubMed

Publication types