Speed Group Microarray Page Index to our site |
Home
- GL Workshop 2001- Summary
Genelogic Workshop on Low Level Analysis of Affymetrix Genechip® dataHeld in Bethesda, Maryland on November 19, 2001. Brief summary of papers and discussions.A. SPECIFIC TOPICS1. Image analysis. Harry Zuzan discussed a method of improving grid alignment to achieve better attribution of pixels to probe cells, see his talk. He believes the problem is significantly reduced in the current GeneChip sofware. Systematic comparisons over many chips have yet to be done. Are there other image analysis issues needing attention? The workshop was unsure of the answer to this question. Harry Zuzan and colleagues from Duke have submitted mss revisiting the image analysis of GeneChip data, but a systematic comparison of algorithms on a substantial body of data has yet to be carried out. Conclusion: it is not clear whether there are worthwhile gains to be made by revistiing the Affymetrix image analysis. Needs more work. 2. Background (bg) adjustment of PM (MM) values. Questions: Additive? General? Local? Probe specific? How should bg be estimated? Role of MM? Sources (after Felix Naef): Physical: light reflection from substrate, photodetector dark current, Biological: non-specific (cross) hybridization. Called stray signal by Earl Hubbell. Felix Naef uses a global bg value based on a simple model: PM (MM) = signal + bg, where bg is approximately normal. Rafael Irizarry uses the same model but a lognormal for bg. Each estimates the mean and the SD of the distribution, but subtracts bg in different ways, see their talks. Peter Haberl reported in a sector-specific gradient correction model, which includes, but is more elaborate than a sector-specific bg model. Earl Hubbell uses MM as probe-specific bg correction of PM when MM 3. Normalization of PM (MM) values.
Questions: Desirable? If so, when? Which algorithm : Schadt-Li-Wong,
Astrand, Bolstad, Irizarry, regression or other to median chip, etc?
This topic was addressed by two authors: Magnus Astrand and Rafael
Irizarry. The presentation by Magnus summarizes the range of
possibilities. See also Dan Holder's talk.
What is still lacking here is a systematic comparison of the
alternatives based on a large body of data, including some "truth", see
B below. That will be done in coming months.
4. Quality assessment.
Both Peter Haberl and Michael Elashoff discussed this topic in some
depth, see their presentations. More can be found in the dChip package.
Many systematic effects and errors can now be detected and to some
extent corrected for, with general and model-based methods.
However, there does not yet seem to be a clear link between any of the
wide range of quality indicators and error detection methods and
analysis outcomes. This will probably be done in coming months, but it is a
large task, most suited to people with access to large numbers of chips.
5. Presence/absence calls.
Not discussed at the workshop.
6. Expression level calculation.
Some questions: Should it be non-negative? Should it be on a log scale?
Should we use probe specific MMs? If model based, then which model?
This was probably the most widely discussed topic of the workshop.
There were three main threads to the contributions:
a) dealing with PM-MM values, positive and negative, see the talks
by Dan Holder, Peter Munson, and Peter Haberl. Here the solution was
transformations: linear-log hybrid (Holder, Haberl); Box-Cox, G-Log
and adaptive (Munson). Note that Affymetrix is no longer taking
differences regardless of their sign, see Earl Hubbell's presentation.
b) the Li & Wong model and variations on it: see the papers by Cheng Li,
Fred Wright, and
c) summaries based on log(PM-bg), see the papers by Felix Naef, Earl
Hubbell and Rafael Irizarry.
Go to the papers for details of their approaches. Most have yet to be
compared sustematically on large bodies of data with some truth, though
the discussions of Fred Wright, Dan Holder and Rafael Irizarry contain some
comparisons. In all three of those we saw evidence that the Li & Wong
(reduced) model can be improved upon, despite its evident advantages
over the current (but soon to be outdated) GeneChip software.
While it may be premature to say this, my impression after the workshop
is that the Li & Wong model needs to drop the constant variance additive
"error" assumption, and have its "error" depend on MM and PM values,
i.e. have the variance dependent on signal strength, or take logs.
When this is done, we will not be far from a constant variance joint
model for (log(PM-bg), log(MM-bg')) for some bg.
In my view the following model seemed closest to having consensus:
PM = general_bg + ps_bg + noise + 2^(ps_aff + signal + noise* )
Here ps = probe-specific, aff= affinity, signal is on a log_2 scale,
the noise term is lognormal, the noise* term is independent of noise,
homoscedastic and maybe normal, and all primed terms are MM
analogues of the corresponding PM terms. Some people might drop some of
these terms, while it is not obvious how to estimate them all, and the
extend to which the error terms in the two equations are correlated is
not clear.
Definitely still an active area of research, though the gains may not be
great, except at the low end of the intensity range, see 2 above.
Should robust/resistant summaries of probe-level data be used?
Definitely yes, see talks by Earl Hubbell and Dan Holder.
If so, on what? The best option seems to be the two-way array of
chip x probe pair summaries. If this is the case, then one issue that
will need to be addressed is how this feeds into later statistical
analyses, e.g. replicated two-sample comparisons.
7. Normalization of expression values.
Questions: Desirable? Needed after normalizing PMs? If so, how?
Use of spike-ins?
Main contribution to this topic: Andrew Hill's talk, see also Peter
Munson's and Dan Holder's talks.
Affymetrix's GeneChip software "normalizes" every chip to a standard
value, and does not normalize PM and MM values.
A major issue here, addressed by Andrew Hill and mentioned by other
speakers, concerns cases where the chip monitors only a fraction of the
genome, so that a standard total (average) signal would be inappropriate.
This probably applies to chip sets as well.
The paper by Rafael Irizarry suggested that after normalizing chips to
one another at the PM and MM level, further chip-level normalization
based on expression values may not be be needed, but that conclusion
was not firmly established. However, it seems possible that "most" of
the normalization issues can be addressed at the PM and MM level, though
clearly not the one relating to the extend to which the genome is
represented on a chip.
Andrew Hill's paper showed that while spike-ins can be helpful in dealing
with normalization, it would not be wise to rely upon them alone. One
promising possibility seems to be normalizing sets of chips at the PM
level, with fine-tuning of absolute level making use of spike-in
data. This is not a totally straightforward task, but some hybrid of
spike-in and global normalization seems to be the way ahead.
8. Dealing with chip-sets (A, B, etc).
Question: Any special issues here? Normalize within chip type?
See 7 above.
9. Value of viewing each chip as one of a class of chips.
How should the class be defined? How should it be used?
Not explicitly addressed at the workshop, but clearly still an issue.
a) Any replicate data. Many of us have such data, and see b) and c) below.
b) Spike-in. One of more cRNAs spiked into the sample at known
concentrations. Gene Logic and Affymetrix both have substantial data
sets of this kind.
c) Dilution series. The Gene Logic experiment, see Bill Craven's talk.
d) Mixture series. Using samples A, B, .5A + .5B, .25A + .75B, etc. See
Fred Wright's and Dan Holder's talks. The Gene Logic experiment also
has a substantial mixture component.
e) Can we make use of routine experimental data here? Dan Holder showed
one way to use routine data, in the absence of "truth": his POS and NEG
sets of genes.
f) Genes that have been checked by some other method: northerns, RT-PCR, etc.
Affymetrix and Gene Logic have both agreed to place their data (b &c
above) on their web sites for general use. Fred Wright's data is already
there. Many thanks to all these sharers of data. When they have their
web sites set up, the links will be here.
a) Calculating variance of expression indices across replicate data.
See Fred Wright's and Rafael Irizarry's talks.
b) Calculating bias of expression indices using spike-in, mixture or
dilution data.
See Rafael Irizarry's and Dan Holder's talks.
c) Assessing goodness of fit of models underlying model-based approaches.
See Rafael Irizarry's talk.
d) Stratifying metrics by average expression value.
Seems to be a good idea. Not yet done.
e) Dealing with expression indices in different scales. How?
As for d.
f) Calculating receiver operating characteristic (ROC) curves using
a set of genes known to be differentially expressed and a set
known not to be differentially expressed.
See Dan Holder's and Fred Wright's talks.
a) The mysterious bimodality of the distribution of (PM,MM) pairs. See
Felix Naef's talk and the ms Irizarry et al. listed under additional
material.
b) Dealing with saturation. The topic of Cheng Li's talk.
c) Doing some theory. See Fred Wright's talk. How about some under a
model such as the one given in 6 above?
d) A nice probe-pair visualization method: see Peter Munson's talk.
Last modified: Thu Dec 6 6:45:09 PST 2001
|
contact Terry Speed's