[BioC] Adding chips to an existing set of normalised data

Wed Jun 4 12:49:55 MEST 2003

Hi Rafael, 
I was just wondering if you could give me your opinion on my method of normalization. I was always under the impression that it is best to always renormalize the entire data set whenever you add or remove an additional chip. This would correspond to your 0 method. I do understand that this is the most time consuming method, but I have created a visual basic interface that keeps track of all the .cel files we have for our lab. 

So, at any point you wish to have a different group of files to analyze, it is a matter of clicking on the data sets you wish to include, and from here we normalize everything together from the .cel files using rma. It is usually a matter of minutes to have everything renormalized together, and we currently have a collection of about 250 affy chips so far that can be combined together in any combination. 

I thought this was the most precise way of creating normalized data sets, but are the other methods you talked about better and more accurate? 

Thanks, 
Richard Park 
Computational Data Analyzer
Joslin Diabetes Center

-----Original Message-----
From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]
Sent: Wednesday, June 04, 2003 10:53 AM
To: Crispin Miller
Cc: Bioconductor (E-mail)
Subject: Re: [BioC] Adding chips to an existing set of normalised data

if your data is decnent what you describe wont be that big an issue, 
but here are various statergies to solve the problem you describe:

0- keep your cel files and redo everything every time (con: not efficient 
at all)
1- do rma on probe level. then before any expression level analysis 
normalize the merged exprsets. (con: you may over-normalize)
2- decide on a "tyical probe level distribution" and alway map to that 
(con: requires choice of a distribution and some extra coding)
3- use a non-multi array rma (ra?). you bg correct, use a non 
multichip normalization such as rescaling (can vsn be made mono-chip?) 
use robust summary, e.g. median, tukey.biweight, etc...  
(con: under my defition of a good expression measure: it wont be as good 
as rma but itll be better than mas 5.0) 
to see how well this does you can put it through 
affycomp.biostat.jhsph.edu

i would rank these stratergies: 2,1,3,0. to pick a 
typical probe level distribution in strategy 2 i 
would use as many arrays as possible. i would not use a parametric 
distribution, such as normal, just for computational convinience.

On Wed, 4 Jun 2003, 
Crispin Miller wrote:

> Hi!
> Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! 
> 
> These have now sparked a different question for us:
> We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. 
> 
> So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in  the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper).
> 
> Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles...
> 
> Crispin
> http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml
>  
> --------------------------------------------------------
> 
>  
> This email is confidential and intended solely for the use of th... {{dropped}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor