Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 2:11:412.
doi: 10.1186/1471-2164-11-412.

Orthology confers intron position conservation

Affiliations

Orthology confers intron position conservation

Anna Henricson et al. BMC Genomics. .

Abstract

Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Graphical representation of an InParanoid ortholog cluster with the outparalogs outside the cluster indicated. The seed orthologs from the different species are denoted A1 and B1 and they are the bi-directional best Blast hits. Their similarity score (S) is shown. Inparalogs with score S or higher to the seed ortholog are inside the circle with radius S and hence, belonging to the cluster. Inparalogs are added to the cluster independently for each species. The sequences with a lower score than S are outside the cluster and classified as outparalogs. To generate the so-called extended cluster, for each inparalog in the cluster, the closest outparalog (non-ortholog or non-inparalog) from each species is added.
Figure 2
Figure 2
Intron densities in the different genomes. (A) Percentage of sequences harboring introns in the different genomes. (B) Average number of introns per sequence in the different genomes. All sequences means all protein coding genes in the genomes for each species. Orthologs means the subset of orthologs identified by the InParanoid algorithm for each species versus human. As a consequence, for human, orthologs refers to an average of the ortholog sets identified versus each of the other species.
Figure 3
Figure 3
Finding closest non-orthologs to add to the ortholog cluster. Percent orthologs where a closest non-ortholog (cno) in either one or both species could be found, alternatively no cnos were found.
Figure 4
Figure 4
Mean intron position conservation score for the different pair types and species comparisons (A) ortholog-ortholog (o-o) pairs versus ortholog-closest non-ortholog (o-cno) pairs, and (B) inparalog-inparalog (i-i) pairs versus inparalog-closest non-inparalog (i-cni) pairs.
Figure 5
Figure 5
Distribution of intron position conservation values for the different pair types. (A) Hsa versus Ath, ortholog-ortholog (o-o) versus ortholog-closest non-ortholog (o-cno), (B) Hsa versus Dre, o-o versus o-cno, (C) Hsa versus Ath, inparalog-inparalog (i-i) versus inparalog-closest non-inparalog (i-cni), (D) Hsa versus Dre, i-i versus i-cni.
Figure 6
Figure 6
Intron position conservation scores for pairs of the different types binned according to sequence identity. Ortholog-ortholog (o-o) pairs versus ortholog-closest non-ortholog (o-cno) pairs, and inparalog-inparalog (i-i) pairs versus inparalog-closest non-inparalog (i-cni) pairs for (A) Hsa versus Ath, and (B) Hsa versus Dre.

Similar articles

Cited by

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. doi: 10.2307/2412448. - DOI - PubMed
    1. Sonnhammer EL, Koonin E. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002;18:619–620. doi: 10.1016/S0168-9525(02)02793-2. - DOI - PubMed
    1. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999;96:2896–2901. doi: 10.1073/pnas.96.6.2896. - DOI - PMC - PubMed
    1. Remm M, Storm CEV, Sonnhammer ELL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–1052. doi: 10.1006/jmbi.2000.5197. - DOI - PubMed
    1. Li L, Stoeckert CJJ, Roos DS. Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. - DOI - PMC - PubMed

Publication types