[R] heatmap.2: question regarding the "raw z-score"
James W. MacDonald
jmacdon at med.umich.edu
Thu Jul 9 16:17:40 CEST 2009
Hi Chrysanthi,
Chrysanthi A. wrote:
>
> Thanks a lot..! What exactly the sweep function is doing? Also, is there
> a possibility instead of using the mean of the whole row to get only the
> mean of a group of the row values? So the values in the matrix (heat
> map) used in the comparison are z-scores and not the intensities of the
> gene expressions, right?
I was trying to give a subtle hint below, but maybe I should be a bit
more blunt. One of the coolest things about R is that it is free, and
there are these sweet listservs where people give advice and help for
free as well.
HOWEVER, there is still a price to pay, and that is with your time. All
of these functions have help pages that the developers spent time
writing, and the code is there for you to peruse. Because of this, there
is some expectation that you would have done so prior to asking
questions. Now I have read the help page for sweep, and quite frankly it
is a bit confusing. The term 'sweep' is used without definition, so if
one doesn't know what that means the help page is less than helpful. But
it doesn't take much time or effort to empirically see what it does:
> a <- matrix(rnorm(25), ncol=5)
> a
[,1] [,2] [,3] [,4] [,5]
[1,] 0.6841637 -1.0590185 -0.1719887 -0.01916011 -1.61936817
[2,] 0.5707217 1.4790968 1.6736991 -0.72158518 1.22467334
[3,] 0.4440499 -0.3382888 -0.1504191 0.32140022 1.83780859
[4,] -0.6659568 3.0573678 -1.5709904 -1.35618488 -0.01717017
[5,] -0.3182206 2.2777597 -0.2325356 -0.02001414 1.77440090
> rm <- rowMeans(a)
> rm
[1] -0.4370743 0.8453211 0.4229102 -0.1105869 0.6962780
> sweep(a, 1, rm, "-")
[,1] [,2] [,3] [,4] [,5]
[1,] 1.12123808 -0.6219441 0.2650857 0.4179142 -1.18229384
[2,] -0.27459943 0.6337756 0.8283779 -1.5669063 0.37935220
[3,] 0.02113977 -0.7611990 -0.5733293 -0.1015100 1.41489842
[4,] -0.55536988 3.1679546 -1.4604035 -1.2455980 0.09341672
[5,] -1.01449866 1.5814817 -0.9288137 -0.7162922 1.07812286
For your second question:
?heatmap.2
>
> Also, as I can understand from the code, heatmap is using distfun
> function for the clusering. Can I use pearson correlation for the
> clustering? My main object of using the heatmap is to examine the
> expression levels of the marker genes and to confirm that the marker
> genes are clearly differentially expressed in the two subtypes of the
> disease that I examine.
No, heatmap.2() is not using distfun for the clustering. There isn't a
function by that name in either gplots nor base R. If you look at the
help page, you can see that distfun is an argument to the function, and
the default is to use the dist() function.
You can use Pearson correlation, but in my experience it takes some
work. Again, if you read the help page, you can see that the Rowv and
Colv arguments can be one of TRUE, FALSE, NULL, or a dendrogram. So if
you want to use Pearson correlation, you should supply heatmap.2() with
dendrograms produced using that correlation. So an example:
a <- matrix(rnorm(50), ncol=5)
rowv <- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv <- as.dendrogram(hclust(as.dist(1-cor(a))))
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)
Best,
Jim
>
> Many thanks,
>
> Chrysanthi.
>
>
> 2009/7/8 James W. MacDonald <jmacdon at med.umich.edu
> <mailto:jmacdon at med.umich.edu>>
>
> Hi Chrysanthi,
>
>
> Chrysanthi A. wrote:
>
> Hi,
>
> I am analysing gene expression data using the heatmap.2 function
> in R and I
> was wondering what is the formula of the "raw z-score" bar which
> shows the
> colors for each pixel.
> According to that post:
> https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html,
> it
> is the
>
> (actual value - mean of the group) / standard deviation.
>
> But, mean of which group? Mean of the gene vector? And actual
> value of that
> gene on a sample? I would be grateful if you could give me some
> more
> details about it or even if there is a book/manual that I could
> address
> to..
>
>
> How about looking at the code?
>
> if (scale == "row") {
> retval$rowMeans <- rm <- rowMeans(x, na.rm = na.rm)
> x <- sweep(x, 1, rm)
> retval$rowSDs <- sx <- apply(x, 1, sd, na.rm = na.rm)
> x <- sweep(x, 1, sx, "/")
> }
> else if (scale == "column") {
> retval$colMeans <- rm <- colMeans(x, na.rm = na.rm)
> x <- sweep(x, 2, rm)
> retval$colSDs <- sx <- apply(x, 2, sd, na.rm = na.rm)
> x <- sweep(x, 2, sx, "/")
> }
>
> So the z-score is calculated on either the row or column (or the
> default of "none").
>
> I don't see how you can get something saying 'raw z-score'. I get
> either 'Row Z-Score' or 'Column Z-Score'. So assuming you meant Row
> Z-Score, then the rows are centered and scaled by subtracting the
> mean of the row from every value and then dividing the resulting
> values by the standard deviation of the row.
>
> Best,
>
> Jim
>
>
>
> Thanks a lot,
>
> Chrysanthi.
>
> *
> *
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
>
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the R-help
mailing list