[BioC] help in 2-color data normalization
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Fri May 11 18:22:03 CEST 2007
Hi Jianping,
> In terms of my previous question of whether or not they could be "real"
> difference existing between the colon cancer and the universal cancer
> cell line RNAs, considerations may be given beyond just removing those
> spots. What I noticed was that some probes can only be hybridized with
> the reference RNAs and some others only with colon cancer samples (see
> "RG_cutoff.jpeg" at <http://www.unc.edu/~jjin/Graph/> ). Take one chip
> as an example, 4548 genes showed green signals more than 2^8 with read
> signals less than 2^6, and 1831 genes showed read signal more than 2^8
> with green signal less than 2^5. On both cases maximum signals, read or
> green, can be as high as 2^12. The observation suggested that there
> exist some real differences between RNAs.
I am not surprised that you can find individual genes that have signal
only in one of the samples, either the reference or the cancer one. In
fact, this is teh sort of thing I am usually looking for: genes that
are either silenced or activated in cancer, with respect to a "normal"
reference.
The plot your showing does not appear to come from normalised arrays,
in which case you can infer little from the differences in the
distribution. What it does show is that you have very weak signal on
both channels on both arrays...
Normalise your data (within arrays, probably using some "flavour" of
loess), and look at the MA plots: that's a better picture of what's
going on.
In an ideal plot, genes that are only expressed in one sample tend to
cluster along the left 2 sides of an imaginary diamond... for instance:
http://mcnach.com/MISC/MAplot.png
This is a very unusual MA plot, from an experiment where many many
many genes are activated (a cell line transfected with a strong
activator, hybridised against the non-transfected cells).
I drew in red the "imaginary diamond", and numbered 1 and 2 teh two
sides I was talking about. Along 1 you get genes that are activated in
one sample (with M>0), and along 2 you woudl get genes silenced in teh
same sample (with M<0).
This experiment is unusual in that it allows to see clearly a "spike"
of activated genes along "1". In most experiments you don'd see
anything like that, but that's the area where ideally you'll have this
sort of genes clustering. If there are many genes that only have
signal in either of your samples, you may see a well populated "cloud"
around these areas.
Your MA plots seem to me to indicate that this is the case (starting
from A around 8+, the stuff on teh left seems a little artifactual)...
but you really need to dig in deeper if you want some clear answers ;)
> This raises another question. Is the pooled universal cancer RNA an
> idea reference? It may create difficulties in explanation of results
> for some genes.
Ideal? It depends on teh experiment, I suppose.
It all depends on what questions you're asking. Even very closely
related samples, from similar tissues, one cancerous and one normal,
have lots of expression differences. Your answers will of course be
determined by what comparisons you're making, what references you
choose, etc. A pooled "universal cancer" RNA can potentially contain
very different types of cells, etc... which can be good or bad,
depending on what you're after, really...
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
More information about the Bioconductor
mailing list