[BioC] Heatmaps for EdgeR
Steve Lianoglou
lianoglou.steve at gene.com
Fri Mar 21 19:19:53 CET 2014
Hi Eleanor,
Please CC (use "reply-all") the bioconductor mailing list on all
correspondences so that everyone can help (and benefit) from this
discussion.
Comments in line:
On 21 Mar 2014, at 11:02, Eleanor Su wrote:
> Can you explain what you mean with that a bit more. You shouldn't be
> doing any normalization of your actual counts prior to feeding them to
> edgeR, are you?
>
> I'm only working with small non-coding RNAs of a non-model organism.
> Since
> this is a fairly new kind of analysis, I'm following someone else's
> pipeline. Thus I've normalized my samples prior doing analysis in R.
> I've
> normalize all my counts based on the reads generated.
What I mean is that you shouldn't do that :-)
Have you read through the edgeR User's Guide? The `calcNormFactors` does
the step that it sounds like you are doing before analysis -- but it
also keeps the count data "in tact" which is what you want. I guess you
are dividing your counts by some normalization constant prior to edgeR
analysis, which is a big no-no.
The (expression) input to edgeR should be the raw count matrix of
features x samples -- many people choose to use only uniquely mapping
reads for this purpose, so probably a good idea for you to ensure that
is the case (at least for your first analysis).
>> Look at section 2.10 of the edgeR User's Guide (Clustering, heatmaps,
>> etc.) where the authors identify this to still be a matter of
>> research, but they suggest to use "moderated log-counts-per-million"
>
> I've generated a heatmap already using this script, but I only want a
> heatmap of the significant differentially expressed sequences.
What script?
> When I
> generate the heatmap accordingly to the section 2.10, I end up with a
> heatmap that I can't even read because it's plotting all the
> sequences.
> Would you suggest just generating a new file with only significant
> sequences and then generating a heatmap accordingly to section 2.10?
When you call the `heatmap` function (or whatever function you are using
to generate these things (the aheatmap function from the NMF package is
quite nice, btw)), you should only pass it a matrix that consists of the
rows you want to plot.
You do not have to generate an intermediary new file to do this.
Don't take this the wrong way, but it sounds like you are quite new to
not just this analysis, but to R as a whole since indexing things
(vectors, lists, matrices) is something very basic that you need to
master before being conversant with the language.
If this is the case, I'd strongly recommend you spend some time reading
up on introductory R stuff (R comes with "an introduction to R") for
some time before trying to do something any more advanced.
Ensuring that you do so will not only mitigate the chances of you
shooting yourself in the foot by doing something silly, but it will also
allow you to get better (and more considered) help here since you will
be able to ask the type of questions that will leverage the expertise
from the people subscribed to this list.
For instance, if you have questions regarding fundamental "R
programming" type of things (indexing a matrix, for example), you should
direct those to R-help, which you can subscribe to here:
https://stat.ethz.ch/mailman/listinfo/r-help
HTH,
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
More information about the Bioconductor
mailing list