[BioC] design matrix
Wolfgang Huber
huber at ebi.ac.uk
Wed May 23 15:39:48 CEST 2007
Dear Lev,
> Thank you for the reply. I have checked it and it is something to do with vsn, it changes its output somehow when you change the sample order. Below is an example for the same data (6 samples, treatments 1 and 2) only in the ordered data i have them as 1,1,1,2,2,2 and in the unordered data i have them as 2,1,1,2,2,1.
There are two issues:
- After ML estimation of the normalization transformation, a constant
overall offset is added by vsn to the resulting data in order to ensure
that the result for high-intensity features is approximately the same as
the log-transformation (i.e. log(x)~glog(x) for large x). This should
not affect any differential expression analysis. In "vsn", this offset
is calculated for the first array (and applied to all). In "vsn2"
(versions 2.x of the package), the offset is called from an 'average
array' (and applied to all). This should reduce confusion about
order-dependence. So please consider updating your package and using vsn2.
Note that since we are on log-scale, adding an overall offset is
equivalent to a change of units, like measuring distances in km versus
in miles. As long as it's done consistently, it has no significance.
- More substantially, the result of vsn is a ML estimate obtained from a
numerical optimisation algorithm, and the results of this can vary
slightly depending on initial conditions and convergence thresholds.
However, the variability introduced by this should be negligible
compared to the biological or technical variability. This also seems to
apply to your example. However, please let me know if this is not the case.
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
> I would be really grateful for any comments.
> Lev.
>
>
> 1. ordered data:
> temp_ord<-vsn(signals_ord)
> design_ord <- model.matrix(~0 +factor(c(1,1,1,2,2,2)))
> colnames(design_ord) <- c("group1", "group2")
> contrast.matrix <- makeContrasts(group2-group1, levels=design_ord)
> exprs(temp_ord)<- log2(exp(exprs(temp_ord)))
> fit_ord <- lmFit(temp_ord, design_ord)
> fit2_ord <- contrasts.fit(fit_ord, contrast.matrix)
> fit2_ord <- eBayes(fit2_ord)
> topTable(fit2_ord, adjust="BH")
>
> ID logFC AveExpr t P.Value adj.P.Val B
> 168840 -5.302013 10.23113 -27.83760 3.589533e-09 2.716231e-05 11.23349
> 101140 -5.174833 10.22872 -27.68107 3.751573e-09 2.716231e-05 11.20243
> 123375 -4.265788 11.76289 -27.65844 3.775669e-09 2.716231e-05 11.19792
> 194271 -4.237924 14.03339 -27.06015 4.480631e-09 2.716231e-05 11.07621
>
> 2. unordered data:
> temp<-vsn(signals)
> design <- model.matrix(~0 +factor(c(2,1,1,2,2,1)))
> colnames(design) <- c("group1", "group2")
> contrast.matrix <- makeContrasts(group2-group1, levels=design)
> exprs(temp)<- log2(exp(exprs(temp)))
> fit <- lmFit(temp, design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> topTable(fit2, adjust="BH")
>
> ID logFC AveExpr t P.Value adj.P.Val B
> 168840 -5.302214 12.51356 -27.83821 3.590400e-09 2.717735e-05 11.23314
> 101140 -5.175045 12.51114 -27.67796 3.756415e-09 2.717735e-05 11.20135
> 123375 -4.265825 14.04540 -27.65702 3.778740e-09 2.717735e-05 11.19717
> 194271 -4.237919 16.31592 -27.05930 4.483564e-09 2.717735e-05 11.07557
>
> > signals_ord[1:2, 1:6]
> 282357 191.85 542.75 644.07 544.29
> 224689 66.95 134.27 127.79 102.97
>
> 282357 119.78 936.58
> 224689 72.42 357.10
> > signals[1:2, 1:6]
> 282357 936.58 542.75 644.07 544.29
> 224689 357.10 134.27 127.79 102.97
>
> 282357 119.78 191.85
> 224689 72.42 66.95
> > exprs(temp_ord)[1:2,1:6]
> 282357 8.011335 8.393353 8.384370 8.176880
> 224689 7.058134 7.339350 7.396429 7.254554
>
> 282357 7.550265 8.338583
> 224689 7.228878 7.677248
> > exprs(temp)[1:2,1:6]
> 282357 10.621015 10.675780 10.666816 10.459286
> 224689 9.959617 9.621677 9.678802 9.536861
>
> 282357 9.832487 10.293657
> 224689 9.511012 9.340247
>
> > sessionInfo()
> R version 2.4.1 (2006-12-18)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] "tools" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base"
>
> other attached packages:
> vsn Biobase limma
> "1.12.0" "1.12.2" "2.9.8"
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list