[R] heatmap.2: question regarding the "raw z-score"

Thu Jul 9 16:17:40 CEST 2009

Hi Chrysanthi,

Chrysanthi A. wrote:
> 
> Thanks a lot..! What exactly the sweep function is doing? Also, is there 
> a possibility instead of using the mean of the whole row to get only the 
> mean of a group of the row values? So the values in the matrix (heat 
> map) used in the comparison are z-scores and not the intensities of the 
> gene expressions, right? 

I was trying to give a subtle hint below, but maybe I should be a bit 
more blunt. One of the coolest things about R is that it is free, and 
there are these sweet listservs where people give advice and help for 
free as well.

HOWEVER, there is still a price to pay, and that is with your time. All 
of these functions have help pages that the developers spent time 
writing, and the code is there for you to peruse. Because of this, there 
is some expectation that you would have done so prior to asking 
questions. Now I have read the help page for sweep, and quite frankly it 
is a bit confusing. The term 'sweep' is used without definition, so if 
one doesn't know what that means the help page is less than helpful. But 
it doesn't take much time or effort to empirically see what it does:

 > a <- matrix(rnorm(25), ncol=5)
 > a
            [,1]       [,2]       [,3]        [,4]        [,5]
[1,]  0.6841637 -1.0590185 -0.1719887 -0.01916011 -1.61936817
[2,]  0.5707217  1.4790968  1.6736991 -0.72158518  1.22467334
[3,]  0.4440499 -0.3382888 -0.1504191  0.32140022  1.83780859
[4,] -0.6659568  3.0573678 -1.5709904 -1.35618488 -0.01717017
[5,] -0.3182206  2.2777597 -0.2325356 -0.02001414  1.77440090
 > rm <- rowMeans(a)
 > rm
[1] -0.4370743  0.8453211  0.4229102 -0.1105869  0.6962780
 > sweep(a, 1, rm, "-")
             [,1]       [,2]       [,3]       [,4]        [,5]
[1,]  1.12123808 -0.6219441  0.2650857  0.4179142 -1.18229384
[2,] -0.27459943  0.6337756  0.8283779 -1.5669063  0.37935220
[3,]  0.02113977 -0.7611990 -0.5733293 -0.1015100  1.41489842
[4,] -0.55536988  3.1679546 -1.4604035 -1.2455980  0.09341672
[5,] -1.01449866  1.5814817 -0.9288137 -0.7162922  1.07812286

For your second question:

?heatmap.2

> 
> Also, as I can understand from the code, heatmap is using distfun 
> function for the clusering. Can I use pearson correlation for the 
> clustering? My main object of using the heatmap is to examine the 
> expression levels of the marker genes and to confirm that the marker 
> genes are clearly differentially expressed in the two subtypes of the 
> disease that I examine.

No, heatmap.2() is not using distfun for the clustering. There isn't a 
function by that name in either gplots nor base R. If you look at the 
help page, you can see that distfun is an argument to the function, and 
the default is to use the dist() function.

You can use Pearson correlation, but in my experience it takes some 
work. Again, if you read the help page, you can see that the Rowv and 
Colv arguments can be one of TRUE, FALSE, NULL, or a dendrogram. So if 
you want to use Pearson correlation, you should supply heatmap.2() with 
dendrograms produced using that correlation. So an example:

a <- matrix(rnorm(50), ncol=5)
rowv <- as.dendrogram(hclust(as.dist(1-cor(t(a)))))
colv <- as.dendrogram(hclust(as.dist(1-cor(a))))
heatmap.2(a, scale="row", Rowv=rowv, Colv=colv)

Best,

Jim

> 
> Many thanks,
> 
> Chrysanthi.
> 
> 
> 2009/7/8 James W. MacDonald <jmacdon at med.umich.edu 
> <mailto:jmacdon at med.umich.edu>>
> 
>     Hi Chrysanthi,
> 
> 
>     Chrysanthi A. wrote:
> 
>         Hi,
> 
>         I am analysing gene expression data using the heatmap.2 function
>         in R and I
>         was wondering what is the formula of the "raw z-score" bar which
>         shows the
>         colors for each pixel.
>         According to that post:
>         https://mailman.stat.ethz.ch/pipermail/r-help/2006-September/113598.html,
>         it
>         is the
> 
>         (actual value - mean of the group) / standard deviation.
> 
>         But, mean of which group? Mean of the gene vector? And actual
>         value of that
>         gene on a sample?  I would be grateful if you could give me some
>         more
>         details about it or even if there is a book/manual that I could
>         address
>         to..
> 
> 
>     How about looking at the code?
> 
>        if (scale == "row") {
>            retval$rowMeans <- rm <- rowMeans(x, na.rm = na.rm)
>            x <- sweep(x, 1, rm)
>            retval$rowSDs <- sx <- apply(x, 1, sd, na.rm = na.rm)
>            x <- sweep(x, 1, sx, "/")
>        }
>        else if (scale == "column") {
>            retval$colMeans <- rm <- colMeans(x, na.rm = na.rm)
>            x <- sweep(x, 2, rm)
>            retval$colSDs <- sx <- apply(x, 2, sd, na.rm = na.rm)
>            x <- sweep(x, 2, sx, "/")
>        }
> 
>     So the z-score is calculated on either the row or column (or the
>     default of "none").
> 
>     I don't see how you can get something saying 'raw z-score'. I get
>     either 'Row Z-Score' or 'Column Z-Score'. So assuming you meant Row
>     Z-Score, then the rows are centered and scaled by subtracting the
>     mean of the row from every value and then dividing the resulting
>     values by the standard deviation of the row.
> 
>     Best,
> 
>     Jim
> 
> 
> 
>         Thanks a lot,
> 
>         Chrysanthi.
> 
>         *
>         *
> 
>                [[alternative HTML version deleted]]
> 
>         ______________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         and provide commented, minimal, self-contained, reproducible code.
> 
> 
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     Douglas Lab
>     University of Michigan
>     Department of Human Genetics
>     5912 Buhl
>     1241 E. Catherine St.
>     Ann Arbor MI 48109-5618
>     734-615-7826
> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826