[BioC] VSN 2.2
Wolfgang Huber
huber at ebi.ac.uk
Tue May 15 22:35:10 CEST 2007
> Hans-Ulrich Klein wrote
> Thu May 3 15:50:40 CEST 2007
> I noticed that the function "vsn2" is much slower than the function
> "vsn". Probably it is not a general problem, but at least for my dataset
> the difference in computation time is remarkable:
Dear Hans-Ulrich,
please excuse the delayed reply, I was traveling. I would definitely
like to follow up on this. There are some internal parameters of the
likelihood optimiser that affect computation time, and which I have
tightened (although I wouldn't have thought with such a drastic effect).
* Have others also experienced drastic increases in compute time? *
Internally in the vsn C code, the L-BFGS-B algorithm is given two
parameters (amog others) to decide on convergence. In vsn2, they are:
factr = 4e4;
maxit = 200000;
In the old vsn, they were
factr = 5e+7;
maxit = 40000;
factr controls the tolerance for when the optimiser thinks the target
function is stationary and it has converged; the current setting is 1250
times more precise than previously. maxit is the maximal number of
iterations. After extensive simulations (see the vignette "Verifying
and assessing...") I went for the much more tight settings in order to
ensure convergence to a unique optimum even in cases there the optimal
transformation is close to the normal logarithm (and in that limit the
likelihood has one direction in which it is very flat). But there is a
trade-off with computation time, and of course the trade-off really
depends on the data.
Before making any more changes, I wonder whether you or other users have
comments / wishes. Otherwise I would
-- expose the above parameters to the R-function interface, so people
can set them
-- make the default a bit more lenient again.
More see below.
> > vsnResG <- vsn(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]);
> vsn: 25727 x 5 matrix (48 strata). 100% done.
>
> Finished after ~30s.
>
> > vsnFitG <- vsn2(RG$G[subSet,1:5], strata=RG$genes$Block[subSet])
> vsn: 25727 x 5 matrix (48 strata). 100% done.
>
> Finished after ~1h.
>
>
> After transformation
>
> > Gvsn_new <- predict(vsnFitG, RG$G[subSet,1:5])
> > parsG <- preproc(description(vsnResG))$vsnParams
> > Gvsn_old <- vsnh(RG$G[subSet,1:5] + 0, parsG,
> strata=RG$genes$Block[subSet])
>
> I checked that the variance is independet of the mean. And plotted the
> new versus the old glog intensities:
>
> > plot(Gvsn_new, Gvsn_old/log(2), pch=".")
> > abline(0,1, col="red")
>
> Here, the plot shows a couple of "stripes" with slope 1 and different
> intercepts.
> I uploaded the plot:
> http://img504.imageshack.us/img504/4700/oldnewglogge0.png
>
> I guess that the "stripes" are the 48 printtips (used to stratify the
> data). Thus, the different additive offsets should not influence further
> analysis (like limma).
Yes - but is worrying that the ranges of the different strips (strata)
are much more similar (all between 8 and 16) in the "old" version than
in the new one. Would you be able send me your RG object and so I can
better explore these questions?
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
More information about the Bioconductor
mailing list