[BioC] Help on invariantset normalization function

Thu Jul 5 09:19:32 CEST 2012

Dear Sophie

you could have a look at Section 7 "Normalisation with ’spike-in’ 
probes" of the vsn package vignette.

	Best wishes	
	Wolfgang

Jul/3/12 11:35 AM, Sophie Lamarre scripsit::
> Hi Jim,
>
> Now I understand the problem!
> But I have to normalize diagnostic microarray so I'm looking for several
> methods of normalization in order to retain the best. I can't use the
> quantile normalization because I don't know if the majority of genes are
> invariants.
> I think the housekeeping genes normalization could be a possible
> normalization. I selected the 20 housekeeping genes which seem to be the
> least invariants.
> I don't think the normalization with the invariantset function is
> appropriated in my case.
>
> But if you have any suggestions, I would be glad!
>
> Thank you very much for your help,
>
> Sophie
>
> Le 02/07/2012 18:31, James W. MacDonald a écrit :
>> Hi Sophie,
>>
>> On 7/2/2012 10:35 AM, Sophie Lamarre wrote:
>>> Hello Jim,
>>>
>>> I have 151 patients in my file and 16 417 genes without the 20
>>> housekeeping genes I need to normalize.
>>> I want to try different normalization methods using housekeeping genes.
>>> The classic method is to calculate the mean of the housekeeping genes
>>> (selected) by patient, and subtract this value to each genes of the
>>> same patient.
>>>
>>> I would try the invariant set method with my data file and my list of
>>> housekeeping genes.
>>> When I read the help, one said I had to have 2 vectors: my data file
>>> to normalize and my file containing the intensities of housekeeping
>>> genes (which help me to normalize):
>>
>> Ah, I see. The problem here is that you misunderstand what
>> normalize.invariantset() is intended to do. It is not intended to do
>> what you want, which is to use a set of housekeeping genes to
>> normalize the data. Instead, this is really an internal function for
>> normalize.AffyBatch.invariantset().
>>
>> The idea here is to take one chip (which is what you did), and then
>> some artificially derived 'reference' chip that contains the same
>> number of genes as your chip (and is derived from the mean, median,
>> etc for each gene), and then determine which genes don't change
>> expression between the two, and then fit a line on those 'invariant'
>> genes, which will then be used to normalize your data. If your two
>> vectors are not the same length, you will get the error you see.
>>
>> This is quite different from what you want to do. I don't think there
>> are any functions to do such a simple normalization, and quite frankly
>> what you propose is neither classic nor recommended (if by classic you
>> mean 'a very common and accepted method' rather than 'what people did
>> way back in the past before they knew better').
>>
>> To do what you propose is just a simple application of colMeans() and
>> sweep().
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>>        Usage
>>>
>>> normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007),
>>>                                    verbose = FALSE,
>>>                                    baseline.type =
>>> c("mean","median","pseudo-mean","pseudo-median"),
>>>                                    type =
>>> c("separate","pmonly","mmonly","together"))
>>>
>>> normalize.invariantset(data, ref, prd.td=c(0.003,0.007))
>>>
>>>
>>>        Arguments
>>>
>>> |abatch|
>>>
>>> an|AffyBatch <AffyBatch%2dclass.html>|object.
>>>
>>> |data|
>>>
>>> a vector of intensities on a chip (to normalize to the reference).
>>>
>>> |ref|
>>>
>>> a vector of reference intensities.
>>>
>>>
>>>
>>> Thank you for your help,
>>>
>>> Kind Regards,
>>> --
>>> Sophie LAMARRE
>>>
>>>
>>> Le 02/07/2012 16:12, James W. MacDonald a écrit :
>>>> Hi Sophie,
>>>>
>>>> On 7/2/2012 8:03 AM, Sophie Lamarre wrote:
>>>>> Hello,
>>>>>
>>>>> I try the invariantset normalization function (affy package) on my
>>>>> data:
>>>>>
>>>>>>    test_pat1 =
>>>>>> normalize.invariantset(data_ready_to_normalize_met1[,1],
>>>>> +                                    bd_20hk_norm[,1],
>>>>> +                                    prd.td=c(0.003,0.007))
>>>>> Error on while ((ns.old - ns)>   50) { :
>>>>>      missing value where TRUE / FALSE is required
>>>>
>>>> When you do
>>>>
>>>> data_ready_to_normalize_met1[,1]
>>>>
>>>>
>>>> you are selecting data from only one array. It isn't possible to
>>>> figure out which probesets are invariant with only one array
>>>> (because the implication is that the probesets don't vary in any
>>>> array).
>>>>
>>>> Is there a particular reason that you are trying to normalize just
>>>> one array?
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> # My data to normalize
>>>>>
>>>>>>    data_ready_to_normalize_met1[1:5,1]
>>>>> [1]  5.803779 11.566477  8.583049  8.531674  9.490483
>>>>>
>>>>> # My vector containing my 20 housekeeping genes
>>>>>>    bd_20hk_norm[1:5,1]
>>>>> [1] 14.92680 15.58281 15.15885 15.09599 15.23146
>>>>>
>>>>> My session info:
>>>>>
>>>>>
>>>>>>    sessionInfo()
>>>>> R version 2.14.1 (2011-12-22)
>>>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>     [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>>>>> LC_TIME=fr_FR.UTF-8
>>>>>     [4] LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8
>>>>> LC_MESSAGES=fr_FR.UTF-8
>>>>>     [7] LC_PAPER=C                 LC_NAME=C
>>>>> LC_ADDRESS=C
>>>>> [10] LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8
>>>>> LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>>> methods   base
>>>>>
>>>>> other attached packages:
>>>>>     [1] affy_1.32.1           preprocessCore_1.16.0
>>>>> gplots_2.10.1         KernSmooth_2.23-7
>>>>>     [5] caTools_1.13          bitops_1.0-4.1
>>>>> gdata_2.8.2           gtools_2.6.2
>>>>>     [9] geneplotter_1.32.1    lattice_0.20-0
>>>>> annotate_1.32.3       AnnotationDbi_1.16.19
>>>>> [13] Biobase_2.14.0        limma_3.10.3
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] affyio_1.22.0       BiocInstaller_1.2.1 DBI_0.2-5
>>>>> IRanges_1.12.6
>>>>> [5] RColorBrewer_1.0-5  RSQLite_0.11.1      tools_2.14.1
>>>>> xtable_1.7-0
>>>>> [9] zlibbioc_1.0.1
>>>>>
>>>>>
>>>>> I have no missing value:
>>>>>
>>>>>>    test = is.na(data_ready_to_normalize_met1[,1])
>>>>>>    sum(test)
>>>>> [1] 0
>>>>>
>>>>>
>>>>>
>>>>> Could you help me or give me a example in order I can resolve my
>>>>> problem?
>>>>>
>>>>> Thank your very much,
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Sophie LAMARRE
>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber