[BioC] edgeR and tagwise dispersion: overcorrection for multiple tests?
alessandro.guffanti at genomnia.com
alessandro.guffanti at genomnia.com
Thu Jul 12 10:48:01 CEST 2012
Dear colleagues good morning - I am back to an old issue because I am
now much more
certain of what I see - and I begin to wonder wether this is due to
biology rather than
to analytical tools or strategies ..
=> Here is my sessionInfo() to begin with:
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] edgeR_2.6.7 limma_3.12.1 R.utils_1.12.1 R.oo_1.9.8
[5] R.methodsS3_1.4.2
=> the experiment description: RNA from five samples and five controls,
mice,
homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping
in the UTR (checked also with genome-wide mapping). Tags have been selected
with the following parameters: only in UTR; unique mapping; only one
mismatch;
begin with CATG, hence quite stringent. Hence tha samples are tagged {1
to 5}R
for ths stimulus, {1 to 5} as the control
=> MDS plot and simple pairwise regression analysis of the tag counts
between
R,C,R vs R and C vs C reveals a clear division of the R samples in two
groups:
{1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with
two R samples
and is removed from comparisons
=> three DEG calculations were performed:
(A) all C vs all R;
(B) all C minus 3 C vs 1R + 3R;
(C) all C minus 3 C versus {2R,4R,5R}
=> tagwise dispersion; normalizatuion factor on the libraries
calculated; filtering by minimal CPM in samples leaves between 6000 and
7000 genes for each comparison.
=> results which make me wonder about what is happening in the R
(esperiment) samples:
Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected
PValue I understand)
Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!)
Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR
Now, excuse my ignorance, but this is a rather strong effect of the
subsetting of one of the two comparison
datasets on the FDR, which I did not found in many other similar
analyses. In fact, when I first mailed the list,
I was talking about 'overcorrection for multiple tests'.
Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R}
being totally different samples, which I exclude) for this ? maybe a
strong dependency between the genes involved in the response to the
stimulus in the
two R subgroups ?
I include below the three MDS plots - thanks for any answer and again
excuse me, maybe there is a trivial
reason for this (such as number of samples..) but it is an unqiue
situation between my many SAGE experiments
analyzed with edgeR..
Kind regards,
Alessandro
--
--
Alessandro Guffanti - Head, Bioinformatics, Genomnia srl
Via Nerviano, 31 - 20020 Lainate, Milano, Italy
Ph: +39-0293305.702 Fax: +39-0293305.777
http://www.genomnia.com
"When you're curious, you find lots of interesting things to do."
(Walt Disney)
-----------------------------------------------------------
Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei
soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati
di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o
ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari è da
considerarsi vietato ed abusivo.
The information transmitted is intended only for the person or entity to which it is addressed and
contains confidential and/or privileged material. Any review, retransmission, dissemination or other
use of, or taking of any action in reliance upon, this information by persons or entities other than
the intended recipient is prohibited. If you received this in error, please contact the sender and
delete the material from any computer.
-----------------------------------------------------------
More information about the Bioconductor
mailing list