[BioC] Help on invariantset normalization function
James W. MacDonald
jmacdon at uw.edu
Mon Jul 2 18:31:09 CEST 2012
Hi Sophie,
On 7/2/2012 10:35 AM, Sophie Lamarre wrote:
> Hello Jim,
>
> I have 151 patients in my file and 16 417 genes without the 20
> housekeeping genes I need to normalize.
> I want to try different normalization methods using housekeeping genes.
> The classic method is to calculate the mean of the housekeeping genes
> (selected) by patient, and subtract this value to each genes of the
> same patient.
>
> I would try the invariant set method with my data file and my list of
> housekeeping genes.
> When I read the help, one said I had to have 2 vectors: my data file
> to normalize and my file containing the intensities of housekeeping
> genes (which help me to normalize):
Ah, I see. The problem here is that you misunderstand what
normalize.invariantset() is intended to do. It is not intended to do
what you want, which is to use a set of housekeeping genes to normalize
the data. Instead, this is really an internal function for
normalize.AffyBatch.invariantset().
The idea here is to take one chip (which is what you did), and then some
artificially derived 'reference' chip that contains the same number of
genes as your chip (and is derived from the mean, median, etc for each
gene), and then determine which genes don't change expression between
the two, and then fit a line on those 'invariant' genes, which will then
be used to normalize your data. If your two vectors are not the same
length, you will get the error you see.
This is quite different from what you want to do. I don't think there
are any functions to do such a simple normalization, and quite frankly
what you propose is neither classic nor recommended (if by classic you
mean 'a very common and accepted method' rather than 'what people did
way back in the past before they knew better').
To do what you propose is just a simple application of colMeans() and
sweep().
Best,
Jim
>
> Usage
>
> normalize.AffyBatch.invariantset(abatch, prd.td = c(0.003, 0.007),
> verbose = FALSE,
> baseline.type = c("mean","median","pseudo-mean","pseudo-median"),
> type = c("separate","pmonly","mmonly","together"))
>
> normalize.invariantset(data, ref, prd.td=c(0.003,0.007))
>
>
> Arguments
>
> |abatch|
>
> an|AffyBatch <AffyBatch%2dclass.html>|object.
>
> |data|
>
> a vector of intensities on a chip (to normalize to the reference).
>
> |ref|
>
> a vector of reference intensities.
>
>
>
> Thank you for your help,
>
> Kind Regards,
> --
> Sophie LAMARRE
>
>
> Le 02/07/2012 16:12, James W. MacDonald a écrit :
>> Hi Sophie,
>>
>> On 7/2/2012 8:03 AM, Sophie Lamarre wrote:
>>> Hello,
>>>
>>> I try the invariantset normalization function (affy package) on my
>>> data:
>>>
>>>> test_pat1 = normalize.invariantset(data_ready_to_normalize_met1[,1],
>>> + bd_20hk_norm[,1],
>>> + prd.td=c(0.003,0.007))
>>> Error on while ((ns.old - ns)> 50) { :
>>> missing value where TRUE / FALSE is required
>>
>> When you do
>>
>> data_ready_to_normalize_met1[,1]
>>
>>
>> you are selecting data from only one array. It isn't possible to
>> figure out which probesets are invariant with only one array (because
>> the implication is that the probesets don't vary in any array).
>>
>> Is there a particular reason that you are trying to normalize just
>> one array?
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>>
>>> # My data to normalize
>>>
>>>> data_ready_to_normalize_met1[1:5,1]
>>> [1] 5.803779 11.566477 8.583049 8.531674 9.490483
>>>
>>> # My vector containing my 20 housekeeping genes
>>>> bd_20hk_norm[1:5,1]
>>> [1] 14.92680 15.58281 15.15885 15.09599 15.23146
>>>
>>> My session info:
>>>
>>>
>>>> sessionInfo()
>>> R version 2.14.1 (2011-12-22)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
>>> LC_TIME=fr_FR.UTF-8
>>> [4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8
>>> LC_MESSAGES=fr_FR.UTF-8
>>> [7] LC_PAPER=C LC_NAME=C
>>> LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8
>>> LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] grid stats graphics grDevices utils datasets
>>> methods base
>>>
>>> other attached packages:
>>> [1] affy_1.32.1 preprocessCore_1.16.0
>>> gplots_2.10.1 KernSmooth_2.23-7
>>> [5] caTools_1.13 bitops_1.0-4.1
>>> gdata_2.8.2 gtools_2.6.2
>>> [9] geneplotter_1.32.1 lattice_0.20-0
>>> annotate_1.32.3 AnnotationDbi_1.16.19
>>> [13] Biobase_2.14.0 limma_3.10.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.22.0 BiocInstaller_1.2.1 DBI_0.2-5
>>> IRanges_1.12.6
>>> [5] RColorBrewer_1.0-5 RSQLite_0.11.1 tools_2.14.1
>>> xtable_1.7-0
>>> [9] zlibbioc_1.0.1
>>>
>>>
>>> I have no missing value:
>>>
>>>> test = is.na(data_ready_to_normalize_met1[,1])
>>>> sum(test)
>>> [1] 0
>>>
>>>
>>>
>>> Could you help me or give me a example in order I can resolve my
>>> problem?
>>>
>>> Thank your very much,
>>>
>>> Kind Regards,
>>>
>>> Sophie LAMARRE
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list