[BioC] limma2annaffy output and heatmap questions

Sun Sep 28 05:23:25 CEST 2008

Hi Hui-Yi,

I have a question about the heat map. Which value do you use to draw? Raw fluorescence intensity, normalized fluorescence intensity, or normalized log-ratio?

Many thanks,

Jixin  

----- Original Message -----
From: "Hui-Yi Chu" <huiyi.chu at gmail.com>
To: "James W. MacDonald" <jmacdon at med.umich.edu>
Cc: bioconductor at stat.math.ethz.ch
Sent: Saturday, September 27, 2008 9:50:16 PM GMT -06:00 US/Canada Central
Subject: Re: [BioC] limma2annaffy output and heatmap questions

Hi everyone,

Just wanna say that I found the reason why I got the odd result of fold
change from limma2annaffy function:

Since I've subsetted my eset as esetsub, the arg of limma2annaffy must be
"esetsub" rather than "eset".

Thank you everyone and cheers,
Hui-Yi

On Fri, Sep 26, 2008 at 5:02 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:

> Here is the code, I may forget to paste it.
>
> esetsub <- eset[selected,]
> ef <- exprs(esetsub)
>
>
>
>
> On Fri, Sep 26, 2008 at 4:47 PM, James W. MacDonald <jmacdon at med.umich.edu
> > wrote:
>
>>
>>
>> Hui-Yi Chu wrote:
>>
>>> Hello Jim,
>>>
>>>
>>>  Thank you for the answer of p.value! and sorry for no codes in mail. The
>>>>> followings are one example of my data and codes.
>>>>>
>>>>> *Example *(partially copied from my result table when coef=1, 1st
>>>>> probeID):
>>>>> t-statistic  P-value  Fold-change  mut.a         mut.a        wt1
>>>>> wt2
>>>>> 108.84      0           10.28           9.90436     9.88196   10.2249
>>>>> 10.1021
>>>>>
>>>>>  This isn't possible. You can't have  a fold change of 10.28 for these
>>>> values. The actual number will be something like -0.2 (The mean of 9.9
>>>> and
>>>> 9.88 minus the mean of 10.22 and 10.1 *cannot* be 10.28).
>>>>
>>>>
>>>        I completely agree with you!  And this is why I don't understand
>>> the
>>> table as well.
>>>        In addition, I've compared the last four columns of this table
>>> with
>>> my original normalized expression values, they are identical, which means
>>> there might be something with later steps such as contrast?    Was
>>> something
>>> wrong with my codes??
>>>
>>>
>>>  *Codes*:
>>>>
>>>>> ## data importing
>>>>> library("affy")
>>>>> library("limma")
>>>>> targets <- readTargets("fed_total.txt")
>>>>> aaa <- ReadAffy(filenames = targets $ filename)
>>>>> eset <- rma(aaa)
>>>>> pData(eset)
>>>>> samples <- data.frame(genotype = c("wt", "wt", "mut.a",
>>>>> "mut.a","mut.b",
>>>>> "mut.b"),replicate =c(1,2,1,2,1,2))
>>>>> varInfo <- data.frame(labelDescription=c("wt, mut.a, mut.b", "arb
>>>>> number"))
>>>>> pd <- new("AnnotatedDataFrame", data = samples, varMetadata = varInfo)
>>>>> phenoData(eset) <- new("AnnotatedDataFrame", data = samples,
>>>>> varMetadata =
>>>>> varInfo)
>>>>>
>>>>> ## filtration
>>>>> library("genefilter")
>>>>> f1 <- anyNA
>>>>> f2 <- pOverA(0.25, log2(100))
>>>>> ff <- filterfun(f1, f2)
>>>>> selected <- genefilter(eset, ff)
>>>>> esetsub <- eset[selected,]
>>>>>
>>>>> ## limma fit and contrast
>>>>> library("limma")
>>>>> design <- model.matrix(~0+factor(c(1,1,2,2,3,3)))
>>>>> colnames(design) <- c("wt", "mut.a","mut.b")
>>>>> fit <- lmFit(ef, design)
>>>>>
>>>>
>> What is 'ef'? It appears you aren't using the ExpressionSet from above.
>>
>>  fit2 <- eBayes(fit)
>>>>> contrast.matrix <- makeContrasts("mut.a vs wt" = mut.a-wt,
>>>>>                                "mut.b vs wt" = mut.b-wt,
>>>>>                                "Diff" = (mut.a-wt)-(mut.b-wt),
>>>>>                                levels=design)
>>>>>
>>>>>  You realize that your "Diff" is mut.a - mut.b, yes?
>>>>           Yes, but I think there might be a better way to compare them
>>>> on
>>>> the basis of wt.
>>>>
>>>
>> No. (3-1)-(2-1) is completely identical to 3-2.
>>       Is there any better method to get the idea above?
>>
>>
>>
>>>>
>>>>  fit2 <- contrasts.fit (fit, contrast.matrix)
>>>>
>>>>> fit3 <- eBayes(fit2)
>>>>>
>>>>> ## find DEG and draw heatmap
>>>>> x <- topTable(fit3, coef=1, adjust="fdr", sort.by="P", number=10000)
>>>>> y <- x[x$adj.P.Val < 0.01,]
>>>>> results <- decideTests (fit3, method="separate",
>>>>>                       adjust.method="BH",p.value=0.01,lfc=1)
>>>>>
>>>>>  ## Cluster and heatmaps ------ *draw in three ways, but confused with
>>>>> results*
>>>>>        ##1.   heatmap(as.matrix(esetsub[y$ID,]), scale="none")
>>>>>
>>>>>          #2.   library("Heatplus")
>>>>>                 heatmap_2(esetsub[y$ID,]),  scale="none")
>>>>>          #3.  library(gplots)
>>>>>                heatmap.2(esetsub[y$ID,]), scale= "row", trace="none",
>>>>> density.info="none")
>>>>>              ## or
>>>>>                heatmap.2(esetsub[y$ID,]), scale= "none", trace="none",
>>>>> density.info="none")
>>>>>
>>>>> ## html file output
>>>>> library("affycoretools")
>>>>> library("yeast2.db")
>>>>> annotation(eset) <- "yeast2.db"
>>>>> limma2annaffy(eset, fit3, design, adjust = "fdr",
>>>>>       contrast.matrix, annotation(eset), pfilt = 0.01, fldfilt= 1)
>>>>>
>>>>>
>>>>> *Questions*:
>>>>> I have tons of questions about heatmap, and I sincerely appreciate if
>>>>> anybody can give me some idea.
>>>>>      1. what is the scale??  I know there are some discussion in the
>>>>> maillist, but I am still confused. The result from scale=row is easy to
>>>>> interpret since we can see some blocks across samples, but how it
>>>>> works??
>>>>>
>>>>>  From ?heatmap.2
>>>>
>>>>  scale: character indicating if the values should be centered and
>>>>         scaled in either the row direction or the column direction,
>>>>         or none.  The default is '"row"' if 'symm' false, and
>>>>         '"none"' otherwise.
>>>>
>>>> You might look at ?scale as well.
>>>>
>>>>       2.  when I change the coef=2, the heatmap result is totally
>>>>
>>>>> different, I know the coef=2 is comparing mut.b and wt, so what about
>>>>> the
>>>>> same group of genes in mut.a?
>>>>>
>>>>>  Not sure what this means. It's a different comparison, so you get
>>>> different
>>>> genes. If you want the significant genes from the first comparison, just
>>>> get
>>>> the significant genes and extract those from your ExpressionSet.
>>>>         Thank you for the suggestions!
>>>>
>>>>
>>>
>>>
>>>>       3. The result of limma contains logFC, AvrExprs, t-statistic, p
>>>>
>>>>> value,etc., which value that is used to draw cluster?? logFC or
>>>>> AveExprs??
>>>>> Some papers their color key display fold change, does that mean I have
>>>>> to
>>>>> do
>>>>> something to get the ratio before I draw heatmap?
>>>>>
>>>>>  Again, not sure what you are getting at. You aren't using anything
>>>> from
>>>> limma except for the gene names that are being called significant. The
>>>> data
>>>> are the expression values from your ExpressionSet.
>>>>
>>>>
>>>>
>>>>  4. same question with html table, how does the fold change generate?
>>>>>
>>>>>  It is the difference between the average expression for each group.
>>>>
>>>>
>>>>
>>>>  Thank you very much!!!!!!!!!!
>>>>> Hui-Yi
>>>>>
>>>>>
>>>>> *sessionInfo()*
>>>>> R version 2.7.2 (2008-08-25)
>>>>> i386-pc-mingw32
>>>>>
>>>>> locale:
>>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>>> States.1252;LC_MONETARY=English_United
>>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] splines   tools     stats     graphics  grDevices utils
>>>>> datasets
>>>>> [8] methods   base
>>>>>
>>>>> other attached packages:
>>>>>  [1] yeast2.db_2.2.0      affycoretools_1.12.1 annaffy_1.12.1
>>>>>  [4] KEGG.db_2.2.0        biomaRt_1.14.1       RCurl_0.9-3
>>>>>  [7] GOstats_2.6.0        Category_2.6.0       RBGL_1.16.0
>>>>> [10] annotate_1.18.0      xtable_1.5-3         GO.db_2.2.0
>>>>> [13] AnnotationDbi_1.2.2  RSQLite_0.7-0        DBI_0.2-4
>>>>> [16] graph_1.18.1         affyPLM_1.16.0       gcrma_2.12.1
>>>>> [19] matchprobes_1.12.1   genefilter_1.20.0    survival_2.34-1
>>>>> [22] yeast2cdf_2.2.0      limma_2.14.6         affy_1.18.2
>>>>> [25] preprocessCore_1.2.1 affyio_1.8.1         Biobase_2.0.1
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] cluster_1.11.11 XML_1.94-0.1
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2008 at 8:32 AM, James W. MacDonald
>>>>> <jmacdon at med.umich.edu>wrote:
>>>>>
>>>>>  Hi Hui-Yi,
>>>>>
>>>>>> Hui-Yi Chu wrote:
>>>>>>
>>>>>>  Hi everyone,
>>>>>>
>>>>>>> Apologize first for probably simple question below.
>>>>>>> Thanks for Jim's help, I've got some html files from limma2annaffy
>>>>>>> function,
>>>>>>> however, I am not pretty sure how to interpret the result of "fold
>>>>>>> change".
>>>>>>> Please correct me if I am wrong. Based on my knowledge, after limma,
>>>>>>> for
>>>>>>> example in coef=1, the fold change represents the contrast of the
>>>>>>> expression
>>>>>>> value means from two groups. But the html result made me confused:
>>>>>>>
>>>>>>> The header of html is like this:
>>>>>>> probes   Description   Pubmed   Gene Ontology   Pathway
>>>>>>>  t-statistic
>>>>>>> P-value    Fold-change    array1   array1  array2   array2
>>>>>>>
>>>>>>>
>>>>>>> The question is why the fold change is not really the value that
>>>>>>> (mean
>>>>>>> of
>>>>>>> group1)- (mean of group2) ?? If it is because of permutation??
>>>>>>>
>>>>>>>  The fold change *is* mean(goup1) - mean(group2). If you show us your
>>>>>>>
>>>>>> code
>>>>>> and a snippet of the output we might be able to help (please see the
>>>>>> posting
>>>>>> guide).
>>>>>>
>>>>>>  I've read limma user guide and ?limma2annaffycore, so I believe the
>>>>>> value
>>>>>>
>>>>>>  of
>>>>>>> array 1 and 2 are expression value.
>>>>>>> By the way, the function of pflt filter genes by p value, but if I
>>>>>>> want
>>>>>>> to
>>>>>>> filter gene by adj.p.val??
>>>>>>>
>>>>>>>  I don't know of any function pflt. However, if you mean the argument
>>>>>>>
>>>>>> pfilt,
>>>>>> then it *does* filter by adjusted p-value if in fact you are adjusting
>>>>>> the
>>>>>> p-value to begin with.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Thank you very much!!
>>>>>>
>>>>>>> Hui-Yi
>>>>>>>
>>>>>>>      [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>> James W. MacDonald, M.S.
>>>>>> Biostatistician
>>>>>> Hildebrandt Lab
>>>>>> 8220D MSRB III
>>>>>> 1150 W. Medical Center Drive
>>>>>> Ann Arbor MI 48109-0646
>>>>>> 734-936-8662
>>>>>>
>>>>>>
>>>>>>
>>> Thank you with sincere appreciation!!!
>>> Hui-Yi
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor