[BioC] Problems with robustPca in pcaMethods

Thu Mar 13 00:04:42 CET 2014

Hi Henning,

I'm not the OP, you would have to ask him for the data.

Best wishes
Julian

On 12/03/14 22:25, Henning Redestig wrote:
> Sorry for the slow reply, I changed the call to weightedMedian to use
> na.rm=TRUE but not sure if it solves the issue.. Julian, any chance you
> could share the data? (preferably subsetted to the minimum size that
> still causes the error)
> 
> cheers, Henning (pcaMethods maintainer)
> 
> 
> 2014-03-03 20:34 GMT+01:00 Henrik Bengtsson <hb at biostat.ucsf.edu
> <mailto:hb at biostat.ucsf.edu>>:
> 
>     [cc:ing the maintainer of pcaMethods]
> 
>     Author of matrixStats::weightedMedian() here.  I won't solve OPs
>     problem but I'll add some clues to the non-informative error message
>     from weightedMedian().
> 
>     On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring
>     <julian.gehring at embl.de <mailto:julian.gehring at embl.de>> wrote:
>     > Hi,
>     >
>     > You may check if you have missing values 'NA's in your data.
>     >
>     > Best wishes
>     > Julian
>     >
>     >
>     > On 27/02/14 21:53, J Brown [guest] wrote:
>     >>
>     >> Hi all;
>     >>
>     >> I'm new to R and trying to use the pcaMethods package to analyze
>     a qPCR dataset. My dataset contains many missing values and I think
>     the module I want to use is robustPca, but when I try to apply it to
>     my dataset I keep getting the error described below. Using nipalsPca
>     on my dataset works without errors, so I don't think it's a
>     data-format issue. Using robustPca on the pcaMethods sample dataset
>     "metaboliteData", which has missing values, also works fine
>     (although it warns about missing values), so it isn't a general
>     problem with my install of R and the relevant packages.
>     >>
>     >> The traceback results seems to say that the error is caused by a
>     weighted-median calculation that is part of the robustPca command,
>     but I have no idea why this only comes up using my dataset: could it
>     be because my dataset is already median-normalized (before importing
>     to R)? Troubleshooting this is beyond my abilities at this point;
>     I'd be grateful for any insight anyone can offer.
>     >>
>     >>> pca_results <- pca(centered_data, method = "robustPca", nPcs =
>     10, center = FALSE)
>     >> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed
>     >> In addition: Warning message:
>     >> In robustPca(prepres$data, nPcs = nPcs, ...) :
>     >>   Data is incomplete, it is not recommended to use robustPca for
>     missing value estimation
>     >>
>     >>> traceback()
>     >> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE)
>     >> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE)
>     >> 5: FUN(newX[, i], ...)
>     >> 4: apply(x, 1, L1RegCoef, bk)
>     >> 3: robustSvd(Matrix)
>     >> 2: robustPca(prepres$data, nPcs = nPcs, ...)
>     >> 1: pca(centered_data, method = "robustPca", nPcs = 10, center =
>     FALSE)
> 
>     That error in weightedMedian() occurs because there are missing values
>     in either in x[keep]/a or in the weights abs(a) and argument 'na.rm'
>     defaults to NA(*).  It's better if robustSvd() would call
>     weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or
>     possibly na.rm=FALSE.
> 
>     (*) With weightedMedian(..., na.rm=NA) one tells that function to
>     trust the data (including the weights) to have no missing values.
>     This option exists for efficiency reasons.  If there are missing
>     values, the na.rm=TRUE should be used.  If there could be missing
>     value, na.rm=FALSE should be used (in case NA is returned if there are
>     missing values).  That the default is NA is unconventional and I may
>     consider changing matrixStats to use the more commonly used
>     na.rm=FALSE (no promises though).
> 
>     /Henrik
> 
>     >>
>     >>  -- output of sessionInfo():
>     >>
>     >> R version 3.0.2 (2013-09-25)
>     >> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>     >>
>     >> locale:
>     >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>     >>
>     >> attached base packages:
>     >> [1] parallel  stats     graphics  grDevices utils     datasets
>      methods   base
>     >>
>     >> other attached packages:
>     >> [1] pcaMethods_1.52.1  Rcpp_0.11.0        matrixStats_0.8.14
>     Biobase_2.22.0     BiocGenerics_0.8.0
>     >>
>     >> loaded via a namespace (and not attached):
>     >> [1] R.methodsS3_1.6.1 tools_3.0.2
>     >>
>     >> --
>     >> Sent via the guest posting facility at bioconductor.org
>     <http://bioconductor.org>.
>     >>
>     >> _______________________________________________
>     >> Bioconductor mailing list
>     >> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>     >> Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>     >>
>     >
>     > _______________________________________________
>     > Bioconductor mailing list
>     > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     > https://stat.ethz.ch/mailman/listinfo/bioconductor
>     > Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>