Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar 1:7:101.
doi: 10.1186/1471-2105-7-101.

A multivariate prediction model for microarray cross-hybridization

Affiliations

A multivariate prediction model for microarray cross-hybridization

Yian A Chen et al. BMC Bioinformatics. .

Abstract

Background: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization.

Results: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used.

Conclusion: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Representation of hybridization intensities with respect to the most contiguous base pairs and overlap lengths. Solid circles show strong hybridization intensities (TY > 6.5), while open triangles indicate low intensities (TY ≤ 6.5).
Figure 2
Figure 2
Training and cross-validation (CV) errors of the multivariate models. Minimum training errors (solid circles) of (a) multiple linear regressions (MLRs), (b) regression trees (RTs), and (c) artificial neural networks (ANNs) in the first CV training set decreased, while the CV errors [open squares; Equation (4)] reached the minimum (light-dotted arrows) at the subset size of 2 in (a), 2 in (b), and 5 in (c). The most parsimonious model (dark-solid arrows) within one standard error of the model with the minimum error was the model with 1 predictor for (a), 2 predictors for (b) and 4 predictors for (c). (The cross-validated variance of TY, for reference, is 1.43 ± 0.13).
Figure 3
Figure 3
Variables selected in five fold cross-validation (CV) for the models. Variables (X1 to X12) are plotted versus model subset size (p). Counts of the selected variables in five-fold cross-validation for (a) multiple linear regressions (MLRs), (b) regression trees (RTs), and (c) artificial neural networks (ANNs) as subset size, p, increases from 1 to 12 along x-axis. The darker the color the more often a variable (y-axis) was selected for a model with a given number of independent variables (x-axis). Light-dotted and dark-solid arrows indicate the models with minimum errors and the most parsimonious models within one standard error of the minimum, respectively, as in Figure 2.
Figure 4
Figure 4
Optimal regression tree. Optimal regression tree with predictors [most contiguous base pair (X11) and target GC content (X4)] included in the model.
Figure 5
Figure 5
Cross-validation errors of the multivariate models. Cross-validation errors [Equation (4)] among the three multivariate models (MLR, RT, and ANN) and the third-order polynomial regression [11]. The chosen optimal model for each of the three multivariate methods is labeled with enlarged solid symbols with "+" indicating one standard error of the CV errors. The linear model using the most contiguous hydrogen bond (by treating GC as 3 hydrogen bonds and AT as 2 hydrogen bonds as used in Wren et al.; labeled as contiguous HB) had comparable performance as the linear model using most contiguous base pair as the sole predictor (MLR when p = 1). The cross-validated variance of TY, labeled as "no predictors", is 1.43 ± 0.13. Regression tree with two variables, X11 and X4, outperformed the other multivariate and univariate models.

Similar articles

Cited by

References

    1. Steinmetz LM, Davis RW. Maximizing the potential of functional genomic. Nature Reviews Genetics. 2004;5:190 –1201. doi: 10.1038/nrg1293. - DOI - PubMed
    1. Lipshutz RJ, Morris D, Chee M, Hubbell E, Kozal MJ, Shah N, Shen N, Yang R, Fodor SP. Using oligonucleotide probe arrays to access genetic diversity. Biotechniques. 1995;19:442–447. - PubMed
    1. Okamoto T, Suzuki T, Yamamoto N. Microarray fabrication with covalent attachment of DNA using Bubble Jet technology. Nat Biotech. 2000;18:438. doi: 10.1038/74507. - DOI - PubMed
    1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–467. - PubMed
    1. Ptijssen P. Laboratory Techniques in Biochemistry and molecular biology: hybridization with nucleic acid probes Part I: theory and nucleic acid preparation. Vol. 24. Amsterdam, The Netherlands , Elsevier Science Publishers BV; 1993. Overview of principles of hybridization and the strategy of nucleic acid probe assays; pp. 19–78.

Publication types