Speed Group
Microarray Page

Index to our site

Research

Affy

Papers/Tech. reports

Talks/Posters

Hints/Prejudices

Group Members

Support

Collaborators

Software

Links

Home - GL Workshop 2001- Summary

Genelogic Workshop on Low Level Analysis of Affymetrix Genechip® data

Held in Bethesda, Maryland on November 19, 2001. Brief summary of papers and discussions.

A. SPECIFIC TOPICS

1. Image analysis.

Harry Zuzan discussed a method of improving grid alignment to achieve better attribution of pixels to probe cells, see his talk. He believes the problem is significantly reduced in the current GeneChip sofware. Systematic comparisons over many chips have yet to be done.

Are there other image analysis issues needing attention? The workshop was unsure of the answer to this question. Harry Zuzan and colleagues from Duke have submitted mss revisiting the image analysis of GeneChip data, but a systematic comparison of algorithms on a substantial body of data has yet to be carried out.

Conclusion: it is not clear whether there are worthwhile gains to be made by revistiing the Affymetrix image analysis. Needs more work.

2. Background (bg) adjustment of PM (MM) values.

Questions: Additive? General? Local? Probe specific? How should bg be estimated? Role of MM?

Sources (after Felix Naef): Physical: light reflection from substrate, photodetector dark current, Biological: non-specific (cross) hybridization. Called stray signal by Earl Hubbell.

Felix Naef uses a global bg value based on a simple model:

PM (MM) = signal + bg, where bg is approximately normal.

Rafael Irizarry uses the same model but a lognormal for bg.

Each estimates the mean and the SD of the distribution, but subtracts bg in different ways, see their talks.

Peter Haberl reported in a sector-specific gradient correction model, which includes, but is more elaborate than a sector-specific bg model.

Earl Hubbell uses MM as probe-specific bg correction of PM when MMSummary: the best bg correction has probably not yet been found. Some mix of global, local and probe-specific seems likely to be best, trading off bias and variance. The right balance will doubtless be determined by extensive testing on model data sets. Work will continue. It may make a worthwhile difference in the low-intensity range which bg is used.

3. Normalization of PM (MM) values.

Questions: Desirable? If so, when? Which algorithm : Schadt-Li-Wong, Astrand, Bolstad, Irizarry, regression or other to median chip, etc?

This topic was addressed by two authors: Magnus Astrand and Rafael Irizarry. The presentation by Magnus summarizes the range of possibilities. See also Dan Holder's talk.

What is still lacking here is a systematic comparison of the alternatives based on a large body of data, including some "truth", see B below. That will be done in coming months.

4. Quality assessment.

Both Peter Haberl and Michael Elashoff discussed this topic in some depth, see their presentations. More can be found in the dChip package.

Many systematic effects and errors can now be detected and to some extent corrected for, with general and model-based methods.

However, there does not yet seem to be a clear link between any of the wide range of quality indicators and error detection methods and analysis outcomes. This will probably be done in coming months, but it is a large task, most suited to people with access to large numbers of chips.

5. Presence/absence calls.

Not discussed at the workshop.

6. Expression level calculation.

Some questions: Should it be non-negative? Should it be on a log scale? Should we use probe specific MMs? If model based, then which model?

This was probably the most widely discussed topic of the workshop. There were three main threads to the contributions:

a) dealing with PM-MM values, positive and negative, see the talks by Dan Holder, Peter Munson, and Peter Haberl. Here the solution was transformations: linear-log hybrid (Holder, Haberl); Box-Cox, G-Log and adaptive (Munson). Note that Affymetrix is no longer taking differences regardless of their sign, see Earl Hubbell's presentation.

b) the Li & Wong model and variations on it: see the papers by Cheng Li, Fred Wright, and

c) summaries based on log(PM-bg), see the papers by Felix Naef, Earl Hubbell and Rafael Irizarry.

Go to the papers for details of their approaches. Most have yet to be compared sustematically on large bodies of data with some truth, though the discussions of Fred Wright, Dan Holder and Rafael Irizarry contain some comparisons. In all three of those we saw evidence that the Li & Wong (reduced) model can be improved upon, despite its evident advantages over the current (but soon to be outdated) GeneChip software.

While it may be premature to say this, my impression after the workshop is that the Li & Wong model needs to drop the constant variance additive "error" assumption, and have its "error" depend on MM and PM values, i.e. have the variance dependent on signal strength, or take logs. When this is done, we will not be far from a constant variance joint model for (log(PM-bg), log(MM-bg')) for some bg.

In my view the following model seemed closest to having consensus:

PM = general_bg + ps_bg + noise + 2^(ps_aff + signal + noise* )
MM = general_bg + ps_bg' + noise' + 2^(ps_aff' + signal' + noise*')

Here ps = probe-specific, aff= affinity, signal is on a log_2 scale, the noise term is lognormal, the noise* term is independent of noise, homoscedastic and maybe normal, and all primed terms are MM analogues of the corresponding PM terms. Some people might drop some of these terms, while it is not obvious how to estimate them all, and the extend to which the error terms in the two equations are correlated is not clear.

Definitely still an active area of research, though the gains may not be great, except at the low end of the intensity range, see 2 above.

Should robust/resistant summaries of probe-level data be used?

Definitely yes, see talks by Earl Hubbell and Dan Holder.

If so, on what? The best option seems to be the two-way array of chip x probe pair summaries. If this is the case, then one issue that will need to be addressed is how this feeds into later statistical analyses, e.g. replicated two-sample comparisons.

7. Normalization of expression values.

Questions: Desirable? Needed after normalizing PMs? If so, how? Use of spike-ins?

Main contribution to this topic: Andrew Hill's talk, see also Peter Munson's and Dan Holder's talks.

Affymetrix's GeneChip software "normalizes" every chip to a standard value, and does not normalize PM and MM values.

A major issue here, addressed by Andrew Hill and mentioned by other speakers, concerns cases where the chip monitors only a fraction of the genome, so that a standard total (average) signal would be inappropriate. This probably applies to chip sets as well.

The paper by Rafael Irizarry suggested that after normalizing chips to one another at the PM and MM level, further chip-level normalization based on expression values may not be be needed, but that conclusion was not firmly established. However, it seems possible that "most" of the normalization issues can be addressed at the PM and MM level, though clearly not the one relating to the extend to which the genome is represented on a chip.

Andrew Hill's paper showed that while spike-ins can be helpful in dealing with normalization, it would not be wise to rely upon them alone. One promising possibility seems to be normalizing sets of chips at the PM level, with fine-tuning of absolute level making use of spike-in data. This is not a totally straightforward task, but some hybrid of spike-in and global normalization seems to be the way ahead.

8. Dealing with chip-sets (A, B, etc).

Question: Any special issues here? Normalize within chip type?

See 7 above.

9. Value of viewing each chip as one of a class of chips.

How should the class be defined? How should it be used?

Not explicitly addressed at the workshop, but clearly still an issue.

B. USEFUL DATA SETS FOR COMPARING ALGORITHMS

a) Any replicate data. Many of us have such data, and see b) and c) below.

b) Spike-in. One of more cRNAs spiked into the sample at known concentrations. Gene Logic and Affymetrix both have substantial data sets of this kind.

c) Dilution series. The Gene Logic experiment, see Bill Craven's talk.

d) Mixture series. Using samples A, B, .5A + .5B, .25A + .75B, etc. See Fred Wright's and Dan Holder's talks. The Gene Logic experiment also has a substantial mixture component.

e) Can we make use of routine experimental data here? Dan Holder showed one way to use routine data, in the absence of "truth": his POS and NEG sets of genes.

f) Genes that have been checked by some other method: northerns, RT-PCR, etc.

Affymetrix and Gene Logic have both agreed to place their data (b &c above) on their web sites for general use. Fred Wright's data is already there. Many thanks to all these sharers of data. When they have their web sites set up, the links will be here.

C. EVALUATING ALGORITHMS. METRICS

a) Calculating variance of expression indices across replicate data.

See Fred Wright's and Rafael Irizarry's talks.

b) Calculating bias of expression indices using spike-in, mixture or dilution data.

See Rafael Irizarry's and Dan Holder's talks.

c) Assessing goodness of fit of models underlying model-based approaches.

See Rafael Irizarry's talk.

d) Stratifying metrics by average expression value.

Seems to be a good idea. Not yet done.

e) Dealing with expression indices in different scales. How?

As for d.

f) Calculating receiver operating characteristic (ROC) curves using a set of genes known to be differentially expressed and a set known not to be differentially expressed.

See Dan Holder's and Fred Wright's talks.

D. OTHER ISSUES

a) The mysterious bimodality of the distribution of (PM,MM) pairs. See Felix Naef's talk and the ms Irizarry et al. listed under additional material.

b) Dealing with saturation. The topic of Cheng Li's talk.

c) Doing some theory. See Fred Wright's talk. How about some under a model such as the one given in 6 above?

d) A nice probe-pair visualization method: see Peter Munson's talk.

To top

Last modified: Thu Dec 6 6:45:09 PST 2001
zarray@stat.berkeley.edu



contact Terry Speed's
microarray data analysis group