. 2005 Dec;16(4):453-8.

Sequence alignment by cross-correlation

Alan L Rockwood¹, David K Crockett, James R Oliphant, Kojo S J Elenitoba-Johnson

Affiliations

PMID: 16522868
PMCID: PMC2291754

Sequence alignment by cross-correlation

Alan L Rockwood et al. J Biomol Tech. 2005 Dec.

. 2005 Dec;16(4):453-8.

Authors

Alan L Rockwood¹, David K Crockett, James R Oliphant, Kojo S J Elenitoba-Johnson

Affiliation

¹ ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT 84108, USA. rockwoal@aruplab.com

PMID: 16522868
PMCID: PMC2291754

Abstract

Many recent advances in biology and medicine have resulted from DNA sequence alignment algorithms and technology. Traditional approaches for the matching of DNA sequences are based either on global alignment schemes or heuristic schemes that seek to approximate global alignment algorithms while providing higher computational efficiency. This report describes an approach using the mathematical operation of cross-correlation to compare sequences. It can be implemented using the fast fourier transform for computational efficiency. The algorithm is summarized and sample applications are given. These include gene sequence alignment in long stretches of genomic DNA, finding sequence similarity in distantly related organisms, demonstrating sequence similarity in the presence of massive (approximately 90%) random point mutations, comparing sequences related by internal rearrangements (tandem repeats) within a gene, and investigating fusion proteins. Application to RNA and protein sequence alignment is also discussed. The method is efficient, sensitive, and robust, being able to find sequence similarities where other alignment algorithms may perform poorly.

PubMed Disclaimer

Figures

**FIGURE 1**
Real part of cross-correlation function using Equation 1. A: pyrG gene of M. tuberculosis, cross-correlated with a 10-kb region of M. tuberculosis genome. The large peak of amplitude 1761 identified the presence of the pyrG gene and indicated a perfect match over the full length of the gene. B: pyrG gene of M. leprae, cross-correlated with the same 10-kb region of M. tuberculosis genome produced a peak of amplitude 1307, indicating a high but imperfect degree of sequence similarity.

**FIGURE 2**
Real part of cross-correlation function using Equation 1. A: MV4-11 variant of flt3 gene, cross-correlated with a reference sequence consisting of wild-type flt3 gene, where n = 0 means that the two sequences are unshifted relative to each other, and n = −30 means that the MV4-11 sequence is shifted 30 bases left with respect to wild-type sequence. B Real part of partial sum using equation 2 for MV4-11 variant of flt3 gene compared with a reference sequence consisting of wild-type flt3 gene, showing that location of the 30-base internal repeat occurs between nucleotide 68 and 98.

**FIGURE 3**
Real part of cross-correlation function for the alignment of DNA sequences for the genes coding for **(A)** the NPM protein against the NPM-ALK fusion protein, and **(B)** the ALK protein against the NPM-ALK fusion protein. The amplitude and shift of peaks in the cross-correlation plots were consistent with the position and lengths of the fused protein sequence.

See this image and copyright information in PMC

References

1. Dayhoff MO, Eck RV, Park CM. Atlas of Protein Sequence and Structure, vol. 5. Washington, DC: National Biomedical Research Foundation, 1972:75–84.
1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195–197. - PubMed
1. Needleman SB, Wunsch CD. A general method applicable to search for similarities in the amino acid sequences of two proteins. J Mol Biol 1970;48:442–453. - PubMed
1. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988;85:1444–1448. - PMC - PubMed
1. Pearson WR. Effective protein sequence comparison. Methods Enzymol 1996;266:227–258. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sequence alignment by cross-correlation

Affiliation

Sequence alignment by cross-correlation

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources