Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul;27(7):1263-1272.
doi: 10.1101/gr.216226.116. Epub 2017 Apr 11.

Increased taxon sampling reveals thousands of hidden orthologs in flatworms

Affiliations

Increased taxon sampling reveals thousands of hidden orthologs in flatworms

José M Martín-Durán et al. Genome Res. 2017 Jul.

Abstract

Gains and losses shape the gene complement of animal lineages and are a fundamental aspect of genomic evolution. Acquiring a comprehensive view of the evolution of gene repertoires is limited by the intrinsic limitations of common sequence similarity searches and available databases. Thus, a subset of the gene complement of an organism consists of hidden orthologs, i.e., those with no apparent homology to sequenced animal lineages-mistakenly considered new genes-but actually representing rapidly evolving orthologs or undetected paralogs. Here, we describe Leapfrog, a simple automated BLAST pipeline that leverages increased taxon sampling to overcome long evolutionary distances and identify putative hidden orthologs in large transcriptomic databases by transitive homology. As a case study, we used 35 transcriptomes of 29 flatworm lineages to recover 3427 putative hidden orthologs, some unidentified by OrthoFinder and HaMStR, two common orthogroup inference algorithms. Unexpectedly, we do not observe a correlation between the number of putative hidden orthologs in a lineage and its "average" evolutionary rate. Hidden orthologs do not show unusual sequence composition biases that might account for systematic errors in sequence similarity searches. Instead, gene duplication with divergence of one paralog and weak positive selection appear to underlie hidden orthology in Platyhelminthes. By using Leapfrog, we identify key centrosome-related genes and homeodomain classes previously reported as absent in free-living flatworms, e.g., planarians. Altogether, our findings demonstrate that hidden orthologs comprise a significant proportion of the gene repertoire in flatworms, qualifying the impact of gene losses and gains in gene complement evolution.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Hidden orthologs and the Leapfrog pipeline. (A) Taxonomically restricted genes (TRGs) are genes with no clear orthology relationship (dashed line and question mark) to other known genes (e.g., orthology group of red dots). Improved sensitivity in the detection methods and/or improved taxon sampling can help uncover hidden orthology relationships, thus referring to these former TRGs as hidden orthologs. (B) The Leapfrog pipeline performs a series of reciprocal BLAST searches between an initial well-annotated data set (e.g., human RefSeq), and a target and a “bridge” transcriptome. First, Leapfrog performs BLAST against the human RefSeq and the target (1) and the “bridge” transcriptome (2) and identifies reciprocal best-hit orthologs between the “bridge” and the human RefSeq proteins (3). These annotated genes of the “bridge” are then used to find orthologs in the target transcriptomes by reciprocal best BLAST hits (4 and 5). If these two pairs of reciprocal best BLAST hit searches are consistent between them, the gene in the target transcriptome is deemed a hidden ortholog. Colored shapes within green boxes represent different sequences of each data set.
Figure 2.
Figure 2.
The Leapfrog pipeline recovers hundreds of hidden orthologs in Platyhelminthes. (A) Distribution of hidden orthologs according to their identification in one or more of the analyzed transcriptomes. Most of the hidden orthologs are unique to each lineage. (B) Distribution of species-specific hidden orthologs in each studied species. (C) Amino acid alignment of a fragment of the centrosomal protein SDCCAG8 of H. sapiens, P. vittatus, and S. mediterranea, and pairwise comparison of conserved residues. Positions that differ between the human and the hidden ortholog products are conserved between P. vittatus and one or the other sequences. Black dots indicate residues conserved among the three species.
Figure 3.
Figure 3.
Hidden orthologs, evolutionary rates, and sequence composition analyses. (A) Principal component analysis of the analyzed data showing the eigenvectors for each variable. The first two principal components (PC1, PC2) explain together 67.6% of the observed variability. (B) Number of hidden orthologs in relation to the branch length of each lineage (linear regression in blue; dots with external black line indicate the taxa with highly complete transcriptomes). There is a low correlation between the two variables (R2= 0.124). (C) GC content of each transcript plotted against its average length of G/C stretches considering all studied flatworm transcriptomes (left) and only S. mediterranea (right). The transcripts corresponding to hidden orthologs are in blue. Hidden orthologs do not differentiate from the majority of transcripts. (D) Average length of hidden orthologs compared to the average length of the other genes in transcriptomes with ≥85% CEGs. Hidden orthologs are significantly longer than the rest (Mann-Whitney U test; P < 0.05). (E) Codon adaptation index (CAI) of the hidden orthologs of the planarian species B. candida, D. tigrina, and S. mediterranea compared with nonhidden orthologs. CAI index in hidden orthologs does not significantly differ from the rest of transcripts (Mann-Whitney U test; P < 0.05).
Figure 4.
Figure 4.
Level of paralogy and Ka/Ks values in triclad hidden orthologs. (A) Percentage of hidden orthologs identified by Leapfrog that are present in OrthoFinder and share an orthogroup with other sequences of the same species. We deem these cases as probable fast evolving paralogs (hidden paralogy). (B) Ka and Ks values of 53 one-to-one hidden orthologs of S. mediterranea compared with their respective homologs in the “bridge” species P. vittatus. Although in almost half of these hidden orthologs the Ks value suggested saturation (Ks > 2), for most of the rest the Ka/Ks value was above or around 0.5 (dotted line), which can be a sign of weak positive selection or relaxed constraint. (C) Number of predicted protein–protein interactions in S. mediterranea hidden orthologs (red dot) compared with a distribution of interactions observed in 1000 random samples of similar size (gray bars). Hidden orthologs show a significantly higher number of interactions, suggesting that complementary mutations between protein partners might drive hidden orthology in flatworms.
Figure 5.
Figure 5.
Hidden orthologs in the core set of centrosomal-related proteins. Presence (colored boxes) and absence (empty boxes) of the core set of centrosomal proteins (Azimzadeh et al. 2012) in all analyzed flatworm transcriptomes. Orthologs identified by direct reciprocal best BLAST hit are in blue boxes, and hidden orthologs are in orange. The asterisks indicate the CEP192 protein in the S. mediterranea transcriptomes (pink color code). These proteins were manually identified with the G. tigrina CEP192 sequence as “bridge” by reciprocal best BLAST hit. The five proteins essential for centrosomal replication are boxed in red.

Similar articles

Cited by

References

    1. Agata K, Soejima Y, Kato K, Kobayashi C, Umesono Y, Watanabe K. 1998. Structure of the planarian central nervous system (CNS) revealed by neuronal cell markers. Zoolog Sci 15: 433–440. - PubMed
    1. Albà MM, Castresana J. 2007. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol Biol 7: 53. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215: 403–410. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402. - PMC - PubMed
    1. Azimzadeh J, Wong ML, Downhour DM, Sánchez Alvarado A, Marshall WF. 2012. Centrosome loss in the evolution of planarians. Science 335: 461–463. - PMC - PubMed

Publication types

LinkOut - more resources