Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Nov;38(21):7353-63.
doi: 10.1093/nar/gkq625. Epub 2010 Jul 17.

Issues in bioinformatics benchmarking: the case study of multiple sequence alignment

Affiliations
Review

Issues in bioinformatics benchmarking: the case study of multiple sequence alignment

Mohamed Radhouene Aniba et al. Nucleic Acids Res. 2010 Nov.

Abstract

The post-genomic era presents many new challenges for the field of bioinformatics. Novel computational approaches are now being developed to handle the large, complex and noisy datasets produced by high throughput technologies. Objective evaluation of these methods is essential (i) to assure high quality, (ii) to identify strong and weak points of the algorithms, (iii) to measure the improvements introduced by new methods and (iv) to enable non-specialists to choose an appropriate tool. Here, we discuss the development of formal benchmarks, designed to represent the current problems encountered in the bioinformatics field. We consider several criteria for building good benchmarks and the advantages to be gained when they are used intelligently. To illustrate these principles, we present a more detailed discussion of benchmarks for multiple alignments of protein sequences. As in many other domains, significant progress has been achieved in the multiple alignment field and the datasets have become progressively more challenging as the existing algorithms have evolved. Finally, we propose directions for future developments that will ensure that the bioinformatics benchmarks correspond to the challenges posed by the high throughput data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) 3D structure superposition of protein domains, 1tvxA and 1prtF, using the DaliLite server (RMSD = 2.5, %id = 16). (B) Sequence alignment inferred from the 3D structure superposition. Secondary structure elements are shown above and below the alignment (red = helix; green = beta-strand). (C) Classification of the two domains in the CATH and SCOP databases.
Figure 2.
Figure 2.
(A) Pairwise alignments from Prefab benchmark, based on automatic 3D superpositions (only part of the full length alignments are shown for the sake of clarity). Residues in upper case represent the ‘consensus’ regions that are superposed consistently by two different superposition methods, while lower case characters represent residues that are superposed inconsistently and are excluded from the alignment test. Secondary structure elements are shown above and below the alignment (red = helix; green = beta strand). Black lines above and below the alignment indicate consensus regions that do not have the same secondary structure. Blue dots indicate known functional residues. (B) Multiple alignment of the same set of sequences based on 3D structure superposition and sequence conservation. Blue boxes below the alignment indicate ‘core blocks’ according to the definition used in the BAliBASE benchmark. Secondary structure elements conserved in all sequences are shown above and below the alignment (red = helix; green = beta strand). Black lines above the alignment indicate core blocks that do not have a conserved secondary structure. Outlined boxes indicate sequence segments (red = consensus; green = non-consensus) that are aligned differently in (A) and (B).

References

    1. Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends Genet. 2008;24:142–149. - PMC - PubMed
    1. Reddy R. To dream the possible dream – Turing award lecture. Commun. ACM. 1996;39:105–112.
    1. Tichy WF. Should computer scientists experiment more? IEEE Computer. 1998;31:32–40.
    1. Sim SE, Easterbrook S, Holt RC. In Proceedings of the 25th International Conference on Software Engineering. Washington DC, USA: IEEE Computer Society; 2003. Using benchmarking to advance research: a challenge to software engineering; pp. 74–83.
    1. McClure MA, Vasi TK, Fitch WM. Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol. 1994;11:571–592. - PubMed

Publication types