Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Dec;16(4):453-8.

Sequence alignment by cross-correlation

Affiliations

Sequence alignment by cross-correlation

Alan L Rockwood et al. J Biomol Tech. 2005 Dec.

Abstract

Many recent advances in biology and medicine have resulted from DNA sequence alignment algorithms and technology. Traditional approaches for the matching of DNA sequences are based either on global alignment schemes or heuristic schemes that seek to approximate global alignment algorithms while providing higher computational efficiency. This report describes an approach using the mathematical operation of cross-correlation to compare sequences. It can be implemented using the fast fourier transform for computational efficiency. The algorithm is summarized and sample applications are given. These include gene sequence alignment in long stretches of genomic DNA, finding sequence similarity in distantly related organisms, demonstrating sequence similarity in the presence of massive (approximately 90%) random point mutations, comparing sequences related by internal rearrangements (tandem repeats) within a gene, and investigating fusion proteins. Application to RNA and protein sequence alignment is also discussed. The method is efficient, sensitive, and robust, being able to find sequence similarities where other alignment algorithms may perform poorly.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Real part of cross-correlation function using Equation 1. A: pyrG gene of M. tuberculosis, cross-correlated with a 10-kb region of M. tuberculosis genome. The large peak of amplitude 1761 identified the presence of the pyrG gene and indicated a perfect match over the full length of the gene. B: pyrG gene of M. leprae, cross-correlated with the same 10-kb region of M. tuberculosis genome produced a peak of amplitude 1307, indicating a high but imperfect degree of sequence similarity.
FIGURE 2
FIGURE 2
Real part of cross-correlation function using Equation 1. A: MV4-11 variant of flt3 gene, cross-correlated with a reference sequence consisting of wild-type flt3 gene, where n = 0 means that the two sequences are unshifted relative to each other, and n = −30 means that the MV4-11 sequence is shifted 30 bases left with respect to wild-type sequence. B Real part of partial sum using equation 2 for MV4-11 variant of flt3 gene compared with a reference sequence consisting of wild-type flt3 gene, showing that location of the 30-base internal repeat occurs between nucleotide 68 and 98.
FIGURE 3
FIGURE 3
Real part of cross-correlation function for the alignment of DNA sequences for the genes coding for (A) the NPM protein against the NPM-ALK fusion protein, and (B) the ALK protein against the NPM-ALK fusion protein. The amplitude and shift of peaks in the cross-correlation plots were consistent with the position and lengths of the fused protein sequence.

References

    1. Dayhoff MO, Eck RV, Park CM. Atlas of Protein Sequence and Structure, vol. 5. Washington, DC: National Biomedical Research Foundation, 1972:75–84.
    1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195–197. - PubMed
    1. Needleman SB, Wunsch CD. A general method applicable to search for similarities in the amino acid sequences of two proteins. J Mol Biol 1970;48:442–453. - PubMed
    1. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988;85:1444–1448. - PMC - PubMed
    1. Pearson WR. Effective protein sequence comparison. Methods Enzymol 1996;266:227–258. - PubMed

Publication types

LinkOut - more resources