[BioC] VSN 2.2

Tue May 15 22:35:10 CEST 2007

> Hans-Ulrich Klein wrote
> Thu May 3 15:50:40 CEST 2007

> I noticed that the function "vsn2" is much slower than the function 
> "vsn". Probably it is not a general problem, but at least for my dataset 
> the difference in computation time  is remarkable:

Dear Hans-Ulrich,

please excuse the delayed reply, I was traveling. I would definitely 
like to follow up on this. There are some internal parameters of the 
likelihood optimiser that affect computation time, and which I have 
tightened (although I wouldn't have thought with such a drastic effect).

* Have others also experienced drastic increases in compute time? *

Internally in the vsn C code, the L-BFGS-B algorithm is given two 
parameters (amog others) to decide on convergence. In vsn2, they are:
   factr    = 4e4;
   maxit    = 200000;
In the old vsn, they were
   factr    = 5e+7;
   maxit    = 40000;

factr controls the tolerance for when the optimiser thinks the target 
function is stationary and it has converged; the current setting is 1250 
times more precise than previously. maxit is the maximal number of 
iterations.  After extensive simulations (see the vignette "Verifying 
and assessing...") I went for the much more tight settings in order to 
ensure convergence to a unique optimum even in cases there the optimal 
transformation is close to the normal logarithm (and in that limit the 
likelihood has one direction in which it is very flat). But there is a 
trade-off with computation time, and of course the trade-off really 
depends on the data.

Before making any more changes, I wonder whether you or other users have 
comments / wishes. Otherwise I would
-- expose the above parameters to the R-function interface, so people 
can set them
-- make the default a bit more lenient again.

More see below.

>  > vsnResG <- vsn(RG$G[subSet,1:5], strata=RG$genes$Block[subSet]);
> vsn: 25727 x 5 matrix (48 strata). 100% done.
> 
> Finished after ~30s.
> 
>  > vsnFitG <- vsn2(RG$G[subSet,1:5], strata=RG$genes$Block[subSet])
> vsn: 25727 x 5 matrix (48 strata).   100% done.
> 
> Finished after ~1h.
> 
> 
> After transformation
> 
>  > Gvsn_new <- predict(vsnFitG, RG$G[subSet,1:5])
>  > parsG <- preproc(description(vsnResG))$vsnParams
>  > Gvsn_old <- vsnh(RG$G[subSet,1:5] + 0, parsG, 
> strata=RG$genes$Block[subSet])
> 
> I checked that the variance is independet of the mean. And plotted the 
> new versus the old glog intensities:
> 
>  > plot(Gvsn_new, Gvsn_old/log(2), pch=".")
>  > abline(0,1, col="red")
> 
> Here, the plot shows a couple of "stripes" with slope 1 and different 
> intercepts.
> I uploaded the plot:
> http://img504.imageshack.us/img504/4700/oldnewglogge0.png
> 
> I guess that the "stripes" are the 48 printtips (used to stratify the 
> data). Thus, the different additive offsets should not influence further 
> analysis (like limma).

Yes - but is worrying that the ranges of the different strips (strata) 
are much more similar (all between 8 and 16) in the "old" version than 
in the new one. Would you be able send me your RG object and so I can 
better explore these questions?

Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber