Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbae681.
doi: 10.1093/bib/bbae681.

Online-adjusted evolutionary biclustering algorithm to identify significant modules in gene expression data

Affiliations

Online-adjusted evolutionary biclustering algorithm to identify significant modules in gene expression data

Raúl Galindo-Hernández et al. Brief Bioinform. .

Abstract

Analyzing gene expression data helps the identification of significant biological relationships in genes. With a growing number of open biological datasets available, it is paramount to use reliable and innovative methods to perform in-depth analyses of biological data and ensure that informed decisions are made based on accurate information. Evolutionary algorithms have been successful in the analysis of biological datasets. However, there is still room for improvement, and further analysis should be conducted. In this work, we propose Online-Adjusted EVOlutionary Biclustering algorithm (OAEVOB), a novel evolutionary-based biclustering algorithm that efficiently handles vast gene expression data. OAEVOB incorporates an online-adjustment feature that efficiently identifies significant groups by updating the mutation probability and crossover parameters. We utilize measurements such as Pearson correlation, distance correlation, biweight midcorrelation, and mutual information to assess the similarity of genes in the biclusters. Algorithms in the specialized literature do not address generalization to diverse gene expression sources. Therefore, to evaluate OAEVOB's performance, we analyzed six gene expression datasets obtained from diverse sequencing data sources, specifically Deoxyribonucleic Acid microarray, Ribonucleic Acid (RNA) sequencing, and single-cell RNA sequencing, which are subject to a thorough examination. OAEVOB identified significant broad gene expression biclusters with correlations greater than $0.5$ across all similarity measurements employed. Additionally, when biclusters are evaluated by functional enrichment analysis, they exhibit biological functions, suggesting that OAEVOB effectively identifies biclusters with specific cancer and tissue-related genes in the analyzed datasets. We compared the OAEVOB's performance with state-of-the-art methods and outperformed them showing robustness to noise, overlapping, sequencing data sources, and gene coverage.

Keywords: RNA-sequencing; biclustering; evolutionary algorithm; gene expression data; single-cell RNA-sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Representation of bicluster’s codification in OAEVOB. The indexes of genes (formula image, formula image, formula image, formula image, formula image) and conditions (formula image, formula image, formula image, formula image) are randomly selected to form the bicluster that contains the original values obtained from GEM (rectangles colored in green).
Figure 2
Figure 2
In this example, three biclusters are formed in the initial exploration. The fitness of each bicluster is then calculated, and the two biclusters with the highest fitness are preserved for the first generation.
Figure 3
Figure 3
The highest fitness scores were obtained utilizing online-adjustment using Pearson and distance correlation.
Figure 4
Figure 4
We compute the bicluster’s formula image. We calculate the correlation of each gene concerning the remaining genes in the bicluster and compute the average to determine the bicluster’s fitness (formula image). The biclusters are then sorted by fitness, and we choose the formula image (in this example) with the highest fitness.
Figure 5
Figure 5
The formula image is the average of all the biclusters’ fitness of the last formula image generations (formula image in this example). When the current formula image is less or greater than the previous one, the online-adjustment characteristic updates the formula image and mutation probability values.
Figure 6
Figure 6
Wilcoxon-rank test to compute the side effect. In this case, a value greater than 0.3 is considered a medium level, and a greater than 0.5 is strong. The features of initial exploration, online-adjustment, and TPM scalarization (in Cocel) are shown to improve the OAEVOB’s performance.
Figure 7
Figure 7
The main steps of OAEVOB for all the generations. The algorithm begins with the initial exploration performed only once. The following steps, which are performed in every generation, consist of crossover, mutation, Jaccard calculation, fitness, formula image computation, and preserving the biclusters with the highest fitness.
Figure 8
Figure 8
Relevance and recovery results in the SDs. A) Dataset with the implanted biclusters with an overlapping level of formula image. B) Dataset with the implanted biclusters with a noise level of formula image. C) Average across all SDs with implanted biclusters with different overlapping and level noise. OAEVOB obtained the greatest relevance and recovery scores in (A) and (B), while in (C), OAEVOB shows very competitive results with results of formula image and formula image in relevance and recovery scores, respectively, only outperformed by ARBic.
Figure 9
Figure 9
Gene coverage comparison in the six datasets (formula image indicates that all genes were selected in any generation, and formula image is the opposite, which is the worst value in this context). OAEVOB achieved a GeneCov greater than formula image in all the datasets, and obtained the greatest GeneCov in Tissp, Cocel, Ustilago, BCancer, and GPL5175, only overcome by RecBic, BP-EBA, and ARBic, in Mouse. In contrast, SSLB and FABIA obtained the lowest GeneCov.
Figure 10
Figure 10
Module formula image, identified by OAEVOB in Cocel, has until formula image genes involved in the biological functions and a formula image. Many biological functions are linked (lines) between them, which indicates a strong relationship in the module.
Figure 11
Figure 11
Module formula image, identified by OAEVOB in Mouse, obtaining a formula image. Many biological functions are linked (lines) between them, which indicates a strong relationship in the module.
Figure 12
Figure 12
OAEVOB obtained the highest average number of genes in Tissp, Cocel, Ustilago, BCancer, and GPL5175. On the other hand, RecBic had the highest result in Mouse. BP-EBA and FABIA had the lowest average number of genes across all datasets.
Figure 13
Figure 13
OAVEOB obtained the greatest number of biclusters with a formula image in Tissp, Cocel, Ustilago, BCancer, and GPL5175. RecBic reported the greatest result for Mouse. Conversely, FABIA obtained the lowest number of biclusters across all datasets.

Similar articles

References

    1. de Sousa JS, Gomes LDCT, Bezerra GB. et al. .. An immune-evolutionary algorithm for multiple rearrangements of gene expression data. Genet Program Evolvable Mach 2004;5:157–79. 10.1023/B:GENP.0000023686.59617.57. - DOI
    1. Orphanides G, Reinberg D. A unified theory of gene expression. Cell 2002;108:439–51. 10.1016/S0092-8674(02)00655-4. - DOI - PubMed
    1. Clamp M, Fry B, Kamal M. et al. .. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA 2007;104:19428–33. 10.1073/pnas.0709013104. - DOI - PMC - PubMed
    1. Tupler R, Perini G, Green M. Expressing the human genome. Nature 2001;409:832–3. 10.1038/35057011. - DOI - PubMed
    1. Berg JM, Tymoczko JL, Gatto JGJ. et al. .. Biochemistry, 5th edn. New York: W H Freeman, 2002.

Grants and funding