Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4:16:1541898.
doi: 10.3389/fmicb.2025.1541898. eCollection 2025.

AlgaeOrtho, a bioinformatics tool for processing ortholog inference results in algae

Affiliations

AlgaeOrtho, a bioinformatics tool for processing ortholog inference results in algae

Mary-Francis LaPorte et al. Front Microbiol. .

Abstract

Introduction: Microalgae constitute a prominent feedstock for producing biofuels and biochemicals by virtue of their prolific reproduction, high bioproduct accumulation, and the ability to grow in brackish and saline water. However, naturally occurring wild type algal strains are rarely optimal for industrial use; therefore, bioengineering of algae is necessary to generate superior performing strains that can address production challenges in industrial settings, particularly the bioenergy and bioproduct sectors. One of the crucial steps in this process is deciding on a bioengineering target: namely, which gene/protein to differentially express. These targets are often orthologs which are defined as genes/proteins originating from a common ancestor in divergent species. Although bioinformatics tools for the identification of protein orthologs already exist, processing the output from such tools is nontrivial, especially for a researcher with little or no bioinformatics experience.

Methods: The present study introduces AlgaeOrtho, a user-friendly tool that builds upon the SonicParanoid orthology inference tool (based on an algorithm that identifies potential protein orthologs based on amino acid sequences) and the PhycoCosm database from JGI (Joint Genome Institute) to help researchers identify orthologs of their proteins of interest in multiple diverse algal species.

Results: The output of this application includes a table of the putative orthologs of their protein of interest, a heatmap showing sequence similarity (%), and an unrooted tree of the putative protein orthologs. Notably, the tool would be instrumental in identifying novel bioengineering targets in different algal strains, including targets in not-fully annotated algal species, since it does not depend on existing protein annotations. We tested AlgaeOrtho using three case studies, for which orthologs of proteins relevant to bioengineering targets, were identified from diverse algal species, demonstrating its ease of use and utility for bioengineering researchers.

Discussion: This tool is unique in the protein ortholog identification space as it can visualize putative orthologs, as desired by the user, across several algal species.

Keywords: algae; bioengineering; bioinformatics; metabolic engineering; nutraceuticals; protein orthology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The full workflow for the AlgaeOrtho tool to search for the query proteins with outputs composed of an alignment protein tree and a heatmap of the percent identity matrix for the putative orthologs.
Figure 2
Figure 2
Chlorella HS2 bZIP1 (above, labeled “Chlorella”) has high sequence similarity with Chlorella variabilis NC64A (“ChlNC64A”) and Chlorella sorokiniana UTEX 1602 (“Chloso1602”) orthologs (77.1 and 66.1% respectively). The heatmap depicts Percent Identity Matrix (PIM) where the values are percent sequence similarity (%) between putative orthologs. The names on each axis reflect the species from which the ortholog sequence was identified. The naming convention of the labels reflects the JGI naming convention of proteins from proteome sequences: <jgi>, which denotes a sequence origin of JGI | <species identification code>, originating from the JGI system | <protein identification number>, which is species and JGI specific| <ortholog group number>, which was generated by SonicParanoid.
Figure 3
Figure 3
Chlorella HS2 bZIP1 is clustered with one Chlorella variabilis NC64A (“ChlNC64A”) ortholog, and shares a common ancestor with two other orthologs, one from Chlorella sorokiniana UTEX 1602 (“Chloso1602”) and another from Chlorella sp. A99 (“ChloA99”). The clustering was calculated by Clustal Omega, and the distance calculated by BioPython’s ‘Phylo Tree Construction’ tools and entries are rooted to the mean. The length of the line reflects phylogenetic distance of the sequences. The names on each axis reflect the species from which the ortholog sequence was identified. The naming convention of the labels reflects the JGI naming convention of proteins from proteome sequences: <jgi>, which denotes a sequence origin of JGI | <species identification code>, originating from the JGI system | <protein identification number>, which is species and JGI specific| <ortholog group number>, which was generated by SonicParanoid.
Figure 4
Figure 4
Some proteins identified as LCYB in the Ochrophyta have high protein sequence identity similarity (>70%) with protein orthologs found in other species. Notably, the economically important species Ectocarpus siliculosus (Ectsil1) has high similarity with other human-edible species for this protein, which is known to be related to color, taste, and therefore consumer preference. The names on each axis reflect the species from which the ortholog sequence was identified. The naming convention of the labels reflects the JGI naming convention of proteins from proteome sequences: <jgi> | <species identification code>| <ortholog group number>| <protein identification code>. This signifies: The origin in the JGI database | a code specific to the JGI system | ortholog number generated by SonicParanoid | a protein code specified by JGI.
Figure 5
Figure 5
Different protein orthologs originating from the same genus or species tend to belong to the same clade, as expected. The clustering was calculated by Clustal Omega, and the distance calculated by BioPython’s “Phylo Tree Construction” tools and entries are rooted to the mean. The names on each axis reflect the species from which the ortholog sequence was identified. The naming convention of the labels reflects the JGI naming convention of proteins from proteome sequences: | |  | . This signifies: The origin in the JGI database | a code specific to the JGI system | ortholog number generated by SonicParanoid | a protein code specified by JGI.

Similar articles

References

    1. Altenhoff A. M., Schneider A., Gonnet G. H., Dessimoz C. (2011). OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294. doi: 10.1093/nar/gkq1238, PMID: - DOI - PMC - PubMed
    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. doi: 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Amaral R., Fawley K. P., Němcová Y., Ševčíková T., Lukešová A., Fawley M. W., et al. . (2020). Towards modern classification of eustigmatophytes, including the description of Neomonodaceae, fam. nov. and three new genera. J Phycol. 56, 630–648. doi: 10.1111/jpy.12980 - DOI - PMC - PubMed
    1. Araújo R., Vázquez Calderón F., Sánchez López J., Azevedo I. C., Bruhn A., Fluch S., et al. . (2021). Current status of the algae production industry in Europe: an emerging sector of the blue bioeconomy. Front. Mar. Sci. 7:626389. doi: 10.3389/fmars.2020.626389 - DOI
    1. Arora N., Pienkos P. T., Pruthi V., Poluri K. M., Guarnieri M. T. (2018). Leveraging algal omics to reveal potential targets for augmenting TAG accumulation. Biotechnol. Adv. 36, 1274–1292. doi: 10.1016/j.biotechadv.2018.04.005, PMID: - DOI - PubMed

LinkOut - more resources