[BioC] rcModelMedianPolish of the package preprocessCore/ imitate median polish of RMA
James W. MacDonald
jmacdon at uw.edu
Wed Mar 26 15:58:15 CET 2014
Hi Stefanie,
On 3/26/2014 9:35 AM, Stefanie Busch [guest] wrote:
> Hello,
>
> I want to imitate RMA on illumina and agilent chips. There is no problem with background correction, quantile normalization and log2 tranformation.
I'm not sure median polish is the way to go here. Note that the Affy
chips all have multiple short probes that are all intended in aggregate
to measure a single transcript. In other words, for a given transcript,
there are maybe 4-16 25-mers that are all measuring the same thing, and
we need some way to summarize the data from all those probes into a
single statistic that best represents the underlying abundance of the
transcript. Median polish is a simple way to summarize these data while
accounting for the fact that there is quite a bit of technical
variability between the probes that we want to account for (and then
ignore).
On the other hand, the Illumina and Agilent interrogate each transcript
with a single, longer probe. It may be that there are multiple such
probes on a given array, but the probes themselves are all the same. So
there might be technical variability between the probes because of their
location on the array, but that isn't analogous to the probe-specific
binding that median polish is intended to account for.
>
> But I don't know what I should do exacty with the median polish.
>
> I use an example matrix which look like this:
>> matrix
> [,1] [,2] [,3] [,4] [,5]
> [1,] 18 11 8 21 4
> [2,] 13 7 5 16 7
> [3,] 15 6 7 16 6
> [4,] 19 15 12 18 5
What do the rows and columns represent here? Are these measures from the
same transcript, or different transcripts?
>
> Than I do the median polish
>> rcModelMedianPolish(matrix)
> $Estimates
> [1] 16.25 9.75 8.00 18.00 5.00 1.25 -2.75 -1.25 2.75
>
> $Weights
> NULL
>
> $Residuals
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.5 0.0 -1.25 1.75 -2.25
> [2,] -0.5 0.0 -0.25 0.75 4.75
> [3,] 0.0 -2.5 0.25 -0.75 2.25
> [4,] 0.0 2.5 1.25 -2.75 -2.75
>
> $StdErrors
> NULL
>
> so what should I do next? What are my expression values. I have read that I must substract the Residuals from my original data
I don't know where you might have read that, but it isn't correct. And
you could hypothetically have answered this yourself by simply reading
the help page for the function you are using:
Value:
A list with following items:
Estimates: The parameter estimates. Stored in column effect then row
effect order
So in this case your column effect estimates are
16.25 9.75 8.00 18.00 5.00
and your row effect estimates are
1.25 -2.75 -1.25 2.75
But note that this assumes that you want the column effects, and are
considering the row effects as a nuisance parameter (e.g., the column
effects are the overall median + column effects, whereas the row effects
are simply the row effects):
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 18 11 8 21 4
[2,] 13 7 5 16 7
[3,] 15 6 7 16 6
[4,] 19 15 12 18 5
> z <- medpolish(mat)
> z
Median Polish Results (Dataset: "mat")
Overall: 9.75
Row Effects:
[1] 1.25 -2.75 -1.25 2.75
Column Effects:
[1] 6.50 0.00 -1.75 8.25 -4.75
Residuals:
[,1] [,2] [,3] [,4] [,5]
[1,] 0.5 0.0 -1.25 1.75 -2.25
[2,] -0.5 0.0 -0.25 0.75 4.75
[3,] 0.0 -2.5 0.25 -0.75 2.25
[4,] 0.0 2.5 1.25 -2.75 -2.75
> z$over+z$col
[1] 16.25 9.75 8.00 18.00 5.00
Which is the same result you get from rcModelMedianPolish().
Best,
Jim
>
>> MedianPolish<-rcModelMedianPolish(matrix)
>> matrix-MedianPolish$Residuals
> [,1] [,2] [,3] [,4] [,5]
> [1,] 17.5 11.0 9.25 19.25 6.25
> [2,] 13.5 7.0 5.25 15.25 2.25
> [3,] 15.0 8.5 6.75 16.75 3.75
> [4,] 19.0 12.5 10.75 20.75 7.75
>
>
> But somewhere else I've read that the first 5 values of the Estimates are my expression values. But in this case I have for example 4 different genes. So this sounds not correct.
>
> So what is the right procedure?
>
> Best regards
>
> Stefanie
>
> -- output of sessionInfo():
>
> matrix
> [,1] [,2] [,3] [,4] [,5]
> [1,] 18 11 8 21 4
> [2,] 13 7 5 16 7
> [3,] 15 6 7 16 6
> [4,] 19 15 12 18 5
>> rcModelMedianPolish(matrix)
> $Estimates
> [1] 16.25 9.75 8.00 18.00 5.00 1.25 -2.75 -1.25 2.75
>
> $Weights
> NULL
>
> $Residuals
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.5 0.0 -1.25 1.75 -2.25
> [2,] -0.5 0.0 -0.25 0.75 4.75
> [3,] 0.0 -2.5 0.25 -0.75 2.25
> [4,] 0.0 2.5 1.25 -2.75 -2.75
>
> $StdErrors
> NULL
>
>> MedianPolish<-rcModelMedianPolish(matrix)
>> matrix-MedianPolish$Residuals
> [,1] [,2] [,3] [,4] [,5]
> [1,] 17.5 11.0 9.25 19.25 6.25
> [2,] 13.5 7.0 5.25 15.25 2.25
> [3,] 15.0 8.5 6.75 16.75 3.75
> [4,] 19.0 12.5 10.75 20.75 7.75
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list