. 2006 May 24;34(9):e70.

doi: 10.1093/nar/gkl122.

Relationship between gene expression and observed intensities in DNA microarrays--a modeling study

G A Held¹, G Grinstein, Y Tu

Affiliations

PMID: 16723429
PMCID: PMC1472623
DOI: 10.1093/nar/gkl122

Relationship between gene expression and observed intensities in DNA microarrays--a modeling study

G A Held et al. Nucleic Acids Res. 2006.

. 2006 May 24;34(9):e70.

doi: 10.1093/nar/gkl122.

Authors

G A Held¹, G Grinstein, Y Tu

Affiliation

¹ IBM TJ Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA. gaheld@us.ibm.com

PMID: 16723429
PMCID: PMC1472623
DOI: 10.1093/nar/gkl122

Abstract

A theoretical study of the physical properties which determine the variation in signal strength from probe to probe on a microarray is presented. A model which incorporates probe-target hybridization, as well as the subsequent dissociation which occurs during stringent washing of the microarray, is introduced and shown to reasonably describe publicly available spike-in experiments carried out at Affymetrix. In particular, this model suggests that probe-target dissociation during the stringent wash plays a critical role in determining the observed hybridization intensities. In addition, it is demonstrated that non-specific hybridization introduces uncertainties which significantly limit the ability of any model to accurately quantify absolute gene expression levels while, in contrast, target folding appears to have little effect on these results. Finally, for data from target spike-in experiments, our model is shown to compare favorably with an existing statistical model in determining target concentration levels.

PubMed Disclaimer

Figures

**Figure 1**
Observed hybridization intensity as a function of spike-in target concentration of PM probe 16 of gene 37777_at. The solid line is a best-fit of the data to Equation 10 with I_p and K_d as adjustable parameters and *bg_e* set at the experimentally observed average signal for data taken at zero spike-in concentration. In (a) each data point shown is the average of all of the replicate measurements taken at a given spike-in concentration. Each of the replicate measurements is shown as a separate data point in (b).

**Figure 2**
Observed hybridization intensity as a function of spike-in target concentration of (a) PM probe 9 of gene 1597_at and (b) PM probe 15 of gene 37777_at. The solid lines are best-fits of the data to Equation 10 with I_p and K_d as adjustable parameters and *bg_e* set at the experimentally observed average signal for data taken at zero spike-in concentration. Each data point shown is the average of all of the replicate measurements taken at a given spike-in concentration.

**Figure 3**
Histograms of the values of (a) I_p and (b) K_d obtained by fitting the hybridization intensity as a function of spike-in concentration for 95 selected PM probes (see text) to Equation 10 with I_p and K_d as adjustable parameters and *bg_e* set at the experimentally observed average signal for data taken at zero spike-in concentration. Units of K_d are mol/l.

**Figure 4**
Best-fit values of the energies $\tilde{ɛ} (b_{1}, b_{2})$ for b_1,2 = A,C,G,T obtained by simultaneously fitting the values of I_p and K_d for 95 selected PM probes to Equations 4, 7–9 and 11 using the 16 $\tilde{ɛ} (b_{1}, b_{2})$ , I_probe, a and Γ as adjustable parameters (see text).

**Figure 5**
Observed hybridization intensity, averaged over all replicate measurements, as a function of spike-in target concentration of PM probe 16 of gene 37777_at, plotted on a log–log scale. The zero concentration (leftmost) data point is plotted at 0.05 pM. The black line is a best-fit of the data to Equation 10 with I_p and K_d as adjustable parameters and *bg_e* set at the experimentally observed intensity at zero spike-in concentration. The blue and red lines are plots of Equation 10 with I_p and K_d calculated from the probe sequence using Model I, discussed in the text, and *bg_e* set at the experimentally observed intensity at zero spike-in concentration (blue line), or determined by Equation 13 (red line).

**Figure 6**
Values of I_p and K_d (obtained by fitting the hybridization intensity as a function of spike-in concentration to Equation 10 for 95 selected PM probes—see text) plotted as functions of |ΔG_hyb|, for |ΔG_hyb| calculated using several different models. Method of determining error bars for I_p and K_d is discussed in the text. In all plots the sign of ΔG_hyb is negative, and the red lines show values of I_p and K_d predicted by the model. Units of K_d are moles/liter. (a) I_p and (b) K_d from Model I; (c) I_p and (d) K_d from Variant 1 of Model I; (e) I_p and (f) K_d from Variant 2 of Model I. Blue line in (e) shows value of I_p which minimizes *lds* (see text). Green line in (e) shows value of I_p averaged over the 95 plotted values. (g) I_p and (h) K_d from Variant 3 of Model I. Blue line in (h) shows value of K_d which minimizes *lds*. Green line in (h) shows value of K_d averaged over the 95 plotted values. (i) I_p and (j) K_d from Variant 4 of Model I.

**Figure 7**
(a) Experimentally observed background (i.e. average zero concentration spike-in signal) for selected 95 PM probes plotted as a function of |ΔG_hyb|, where ΔG_hyb is determined by simultaneously fitting I_p and K_d to Equations 4, 7–9 and 11. Values of ΔG_hyb for data plotted are negative. (b) Same data as (a), binned into nine energy bins. Solid lines in (a) and (b) follow Equation 13.

**Figure 8**
Seven sets of best-fit values of the energies $\tilde{ɛ} (b_{1}, b_{2})$ for b_1,2 = A,C,G,T obtained by simultaneously fitting values of I_p and K_d to Equations 4, 7–9 and 11. These sets were derived using the same data and method as in Figure 4, except that each set was derived after excluding data from one of the following genes: 37777_at, 36311_at, 1024_at, 36202_at, 36085_at, 40322_at and 1708_at. The similarity between sets illustrates the extent to which the energy values are independent of the datasets used to derive them.

**Figure 9**
Observed hybridization intensity as a function of calculated |ΔG_hyb| for those PM probes of gene 36085_at which are included in the 95 selected probes (see text). Each data point is the average of all replicate measurements taken for a given probe at target spike-in concentration 1024 pM. The black line is a best-fit of the data to Equation 10, with concentration c the only adjustable parameter; the best-fit c is 759 pM. The red line is a best-fit to only the black data points, the red ones having been identified as statistical outliers (see text); the best-fit c in his case is 881 pM, closer to the known spike-in value of 1024 pM.

**Figure 10**
Comparison of best-fit values of concentration of spike-in probes as determined using Model I/fit_bg, as discussed in text (red squares), and MAS v5 (blue squares). Black squares are at nominal spike-in concentrations; deviations from the nominal concentrations appear as deviations from these points. The abscissa indicates the gene fitted as well as the spike-in concentration; for each gene, the nominal spike-in concentration for each data point is twice that of the preceding point. The lowest concentrations shown for genes 37777_at, 36311_at, 1024_at, 36202_at, 36085_at, 40322_at and 1708_at are 0.25, 1, 8, 1, 1, 0.25 and 0.25 pM, respectively. Lower spike-in concentrations are not shown because either our model or MAS v5 predicted a value of zero concentration. Note that the results from the MAS v5 analysis have been normalized by a multiplicative factor chosen so as to minimize the sum of the squares of the differences between the calculated and nominal concentrations.

See this image and copyright information in PMC

References

1. Brown P.O., Botstein D. Exploring the new world of the genome with DNA microarrays. Nature Genet. 1999;21:33–37. - PubMed
1. Lipshutz R.J., Fodor S.P.A., Gingeras T.R., Lockhart D.J. High density synthetic oligonucleotide arrays. Nature Genet. 1999;21:20–24. - PubMed
1. Lockhart D.J., Dong H.L., Byrne M.C., Follettie M.T., Gallo M.V., Chee M.S., Mittmann M., Wang C.W., Kobayashi M., Horton H., et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 1996;14:1675–1680. - PubMed
1. Statistical Algorithms Reference Guide. 2001. Affymetrix Technical Note.
1. Irizarry R.A., Bolstad B.M., Collin F., Cope L.M., Hobbs B., Speed T.P. Summaries of affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31:e15. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Relationship between gene expression and observed intensities in DNA microarrays--a modeling study

Affiliation

Relationship between gene expression and observed intensities in DNA microarrays--a modeling study

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources