Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 23:11:1177932217690136.
doi: 10.1177/1177932217690136. eCollection 2017.

Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

Affiliations

Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

Luca Ambrosino et al. Bioinform Biol Insights. .

Abstract

The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs, a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable.

Keywords: Functional genomics; RNA; proteins; sequence analysis.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF CONFLICTING INTERESTS: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Disclosures and Ethics As a requirement of publication, author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including, but not limited to, the following: authorship and contributorship, conflicts of interest, privacy and confidentiality, and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.

Figures

Figure 1.
Figure 1.
Pseudocode of the alignment reconstruction algorithm we developed.
Figure 2.
Figure 2.
Pseudocode of BBH algorithm we developed. BBH indicates Bidirectional Best Hit; eBBH, extended BBH.
Figure 3.
Figure 3.
Improvement example of the total alignment length. If we have to align 2 sequences AT1G50940.1 and Sb01g002210.2 (highlighted in green), the tBLASTx program provides different alignment fragments (highlighted in gray), each one corresponding to a given reading frame (highlighted in red) of the 2 sequences. In this example, the algorithm we designed is able to rebuild an entire alignment using an alignment fragment with a reading frame of +3/+1 and an alignment fragment with a reading frame of +2/+3 because they do not share overlapping segments of the aligned sequences.
Figure 4.
Figure 4.
Comparison of results detected by BioMart, PLAZA, and an in-house BLASTp analysis. Venn diagram showing (A) the number of Arabidopsis genes that have a relationship with a Sorghum counterpart, (B) the number of Sorghum genes that have a relationship with an Arabidopsis gene, and (C) the number of exact relationships between Arabidopsis and Sorghum genes.
Figure 5.
Figure 5.
Comparison between Transcriptologs and BLASTp analyses. (A) Pie charts showing some features of BBHs detected only using protein sequences. (B) Venn diagram showing differences and similarities in the number of BBHs detected using protein sequences and transcript sequences. (C) Pie charts showing some features of BBHs detected only using transcript sequences. In the pie chart on the left, the number of alignments that involve UTRs is shown in green, the number of alignments obtained from at least 2 fragments having different reading frames between them is shown in orange, the number of alignments with a different reading frame in comparison with the predicted proteins is shown in gray, the number of alignments with a similarity score less than 100 is shown in blue, and the remaining number of alignments is shown in yellow. BBHs indicates Bidirectional Best Hits; UTRs, untranslated regions.
Figure 6.
Figure 6.
Comparison between Transcriptologs and protein Bidirectional Best Hits (BBHs). Distribution of the (A) BBH scores detected only using transcript sequences, (B) BBH E-values detected only using transcript sequences, (C) BBH scores detected only using protein sequences, and (D) BBH E-values detected only using protein sequences.
Figure 7.
Figure 7.
Comparison between Transcriptologs and protein BBHs. Distribution of the BBH scores detected exclusively using transcript and protein sequences, involving (A) the same Arabidopsis thaliana gene (example of an outlier is pointed by a black arrow) and (B) the same Sorghum bicolor gene.
Figure 8.
Figure 8.
Example of improved similarity detection based on transcript. (A) Arabidopsis thaliana AT3G25572 translated transcript sequence on the frame +3. The protein sequence region released by the TAIR official annotation is highlighted by a black line, and the longest ORF is highlighted by a green line. (B) Schematic view of alignments between the transcript (in gray) and the protein (in red) regions and Sb06g021540 and Sb06g021530 genes, respectively, these 2 representing the BBHs of the Arabidopsis gene AT3G25572, based on transcript or protein similarity; transcripts and proteins alignments lengths (291 aa and 32 aa) are shown as number of amino acids.

Similar articles

Cited by

References

    1. Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C. OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res. 2011;39:D289–D294. - PMC - PubMed
    1. Chen F, Mackey AJ, Stoeckert CJ, Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. - PMC - PubMed
    1. Dessimoz C, Cannarozzi G, Gil M, et al. OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements. Comp Genom. 2005;2005:61–72.
    1. Dessimoz C, Gabaldon T, Roos DS, Sonnhammer EL, Herrero J. Toward community standards in the quest for orthologs. Bioinformatics (Oxford, England). 2012;28:900–904. - PMC - PubMed
    1. Flicek P, Ahmed I, Amode MR, et al. Ensembl 2013. Nucleic Acids Res. 2013;41:D48–D55. - PMC - PubMed

LinkOut - more resources