Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 27;26(1):291.
doi: 10.1186/s12859-025-06327-6.

Revisiting motif finding: do bi-objective metaheuristics surpass single-objective metaheuristics?

Affiliations

Revisiting motif finding: do bi-objective metaheuristics surpass single-objective metaheuristics?

Muhammad Ali Nayeem et al. BMC Bioinformatics. .

Abstract

Background: The discovery of DNA motifs is essential for studying gene expression and function in many biological systems. Most existing algorithms for motif detection rely on a single optimization criterion or objective function. This study formulates motif finding as a bi-objective optimization problem and investigates whether multi-objective metaheuristics offer potential advantages over single-objective approaches.

Results: We developed four variants of the Non-dominated Sorting Genetic Algorithm II (NSGA-II) incorporating simple, problem-specific genetic operators. Experiments on six benchmark datasets from three organisms demonstrate that our bi-objective approach significantly outperforms the state-of-the-art Artificial Bee Colony (ABC) metaheuristic. Remarkably, NSGA-II-PMC achieved superior performance over ABC using 6 times fewer fitness evaluations, highlighting its computational efficiency. The synergistic combination of problem-specific operators proved essential, with individual operators showing limited effectiveness compared to their joint application.

Conclusions: Our findings question the common belief that single-objective metaheuristics are better suited for combinatorial problems like motif finding. The bi-objective formulation helps maintain diversity and avoid premature convergence, even with partially correlated objectives, resulting in better solutions than those obtained through dedicated single-objective optimization. Simple, interpretable problem-specific adaptations can yield substantial performance gains over sophisticated alternatives. These results suggest that bi-objective approaches may provide more robust and computationally efficient solutions for DNA motif discovery, opening new research directions in bioinformatics.

Keywords: Bioinformatics; Metaheuristics; Motif finding; Multi-objective optimization.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no Conflict of interest.

Figures

Fig. 1
Fig. 1
Alignment of substrings from three DNA sequences
Fig. 2
Fig. 2
Solution encoding scheme illustrated with a toy example of formula image sequences of length formula image and motif length formula image. A candidate solution (left) is represented as an integer vector (right), where each entry specifies the 0-based starting index of the motif in the corresponding sequence. The highlighted motif instances (shown in red) begin at positions 2, 5, 3, and 4 in sequences formula image, formula image, formula image, and formula image, respectively
Fig. 3
Fig. 3
Comparison among baseline NSGA-II and its 3 variants based on hypervolume (higher is better) on six datasets. In each plot, the horizontal axis represents the four motif lengths (formula image), while the vertical axis depicts the corresponding hypervolume values. The boxplots illustrate the distribution of the final hypervolume values across 20 independent runs for each algorithm on each dataset
Fig. 4
Fig. 4
Similarity score (higher is better) achieved by ABC-S (ABC [7] with similarity as the objective), and the NSGA-II variants on six datasets. In each plot, the horizontal axis represents the four motif lengths (formula image), while the vertical axis depicts the corresponding similarity values. The boxplots illustrate the distribution of the best similarity scores across 20 independent runs for each algorithm on each dataset
Fig. 5
Fig. 5
Entropy score (higher is better) achieved by ABC-E (ABC [7] with entropy as the objective), the NSGA-II variants on six datasets. In each plot, the horizontal axis represents the four motif lengths (formula image), while the vertical axis depicts the corresponding entropy values. The boxplots illustrate the distribution of the best entropy scores across 20 independent runs for each algorithm on each dataset
Fig. 6
Fig. 6
The Pareto solutions obtained by different algorithms accumulated over 20 independent runs on three datasets (hm03, hm09g, and yst04r). Each column represents a dataset, with the four motif lengths (16, 18, 20, and 22) arranged vertically across rows. Each plot visualizes the non-dominated solutions in the 2D objective space for each dataset and motif length. The points are color-coded based on the algorithm that generated them (NSGA-II, NSGA-II-PC, NSGA-II-PM, NSGA-II-PMC, ABC-S, and ABC-E)
Fig. 7
Fig. 7
The Pareto solutions obtained by different algorithms accumulated over 20 independent runs on three datasets (yst08r, dm01g, and dm03g). Each column represents a dataset, with the four motif lengths (16, 18, 20, and 22) arranged vertically across rows. Each plot visualizes the non-dominated solutions in the 2D objective space for each dataset and motif length. The points are color-coded based on the algorithm that generated them (NSGA-II, NSGA-II-PC, NSGA-II-PM, NSGA-II-PMC, ABC-S, and ABC-E)
Fig. 8
Fig. 8
Sensitivity analysis of mutation rate (formula image) and crossover rate (formula image) for three representative instances. The top row corresponds to hm09g (formula image), the middle row to yst08r (formula image), and the bottom row to dm01g (formula image). For each instance, we report average hypervolume (leftmost column), similarity (middle column), and entropy (rightmost column) values over 10 independent runs across all formula image combinations. Darker colors indicate better performance

References

    1. Hashim FA, Mabrouk MS, Al-Atabany W. Review of different sequence motif finding algorithms. Avicenna J Med Biotechnol. 2019;11(2):130. - PMC - PubMed
    1. Zhang Y, Wang P, Yan M. An entropy-based position projection algorithm for motif discovery. BioMed Res Int. 2016;2016:9127474. - DOI - PMC - PubMed
    1. Bailey TL. Dreme: motif discovery in transcription factor chip-seq data. Bioinformatics. 2011;27(12):1653–9. - DOI - PMC - PubMed
    1. Jia C, Carson MB, Wang Y, Lin Y, Lu H. A new exhaustive method and strategy for finding motifs in chip-enriched regions. PLoS One. 2014;9(1):86044. - DOI - PMC - PubMed
    1. Sinha S, Tompa M. Ymf: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003;31(13):3586–8. - DOI - PMC - PubMed

LinkOut - more resources