[BioC] Problems with robustPca in pcaMethods
Julian Gehring
julian.gehring at embl.de
Thu Mar 13 00:04:42 CET 2014
Hi Henning,
I'm not the OP, you would have to ask him for the data.
Best wishes
Julian
On 12/03/14 22:25, Henning Redestig wrote:
> Sorry for the slow reply, I changed the call to weightedMedian to use
> na.rm=TRUE but not sure if it solves the issue.. Julian, any chance you
> could share the data? (preferably subsetted to the minimum size that
> still causes the error)
>
> cheers, Henning (pcaMethods maintainer)
>
>
> 2014-03-03 20:34 GMT+01:00 Henrik Bengtsson <hb at biostat.ucsf.edu
> <mailto:hb at biostat.ucsf.edu>>:
>
> [cc:ing the maintainer of pcaMethods]
>
> Author of matrixStats::weightedMedian() here. I won't solve OPs
> problem but I'll add some clues to the non-informative error message
> from weightedMedian().
>
> On Mon, Mar 3, 2014 at 9:35 AM, Julian Gehring
> <julian.gehring at embl.de <mailto:julian.gehring at embl.de>> wrote:
> > Hi,
> >
> > You may check if you have missing values 'NA's in your data.
> >
> > Best wishes
> > Julian
> >
> >
> > On 27/02/14 21:53, J Brown [guest] wrote:
> >>
> >> Hi all;
> >>
> >> I'm new to R and trying to use the pcaMethods package to analyze
> a qPCR dataset. My dataset contains many missing values and I think
> the module I want to use is robustPca, but when I try to apply it to
> my dataset I keep getting the error described below. Using nipalsPca
> on my dataset works without errors, so I don't think it's a
> data-format issue. Using robustPca on the pcaMethods sample dataset
> "metaboliteData", which has missing values, also works fine
> (although it warns about missing values), so it isn't a general
> problem with my install of R and the relevant packages.
> >>
> >> The traceback results seems to say that the error is caused by a
> weighted-median calculation that is part of the robustPca command,
> but I have no idea why this only comes up using my dataset: could it
> be because my dataset is already median-normalized (before importing
> to R)? Troubleshooting this is beyond my abilities at this point;
> I'd be grateful for any insight anyone can offer.
> >>
> >>> pca_results <- pca(centered_data, method = "robustPca", nPcs =
> 10, center = FALSE)
> >> Error in if (!all(tmp)) { : missing value where TRUE/FALSE needed
> >> In addition: Warning message:
> >> In robustPca(prepres$data, nPcs = nPcs, ...) :
> >> Data is incomplete, it is not recommended to use robustPca for
> missing value estimation
> >>
> >>> traceback()
> >> 7: weightedMedian.default(x[keep]/a, abs(a), interpolate = FALSE)
> >> 6: weightedMedian(x[keep]/a, abs(a), interpolate = FALSE)
> >> 5: FUN(newX[, i], ...)
> >> 4: apply(x, 1, L1RegCoef, bk)
> >> 3: robustSvd(Matrix)
> >> 2: robustPca(prepres$data, nPcs = nPcs, ...)
> >> 1: pca(centered_data, method = "robustPca", nPcs = 10, center =
> FALSE)
>
> That error in weightedMedian() occurs because there are missing values
> in either in x[keep]/a or in the weights abs(a) and argument 'na.rm'
> defaults to NA(*). It's better if robustSvd() would call
> weightedMedian(x[keep]/a, abs(a), na.rm=TRUE, interpolate=FALSE), or
> possibly na.rm=FALSE.
>
> (*) With weightedMedian(..., na.rm=NA) one tells that function to
> trust the data (including the weights) to have no missing values.
> This option exists for efficiency reasons. If there are missing
> values, the na.rm=TRUE should be used. If there could be missing
> value, na.rm=FALSE should be used (in case NA is returned if there are
> missing values). That the default is NA is unconventional and I may
> consider changing matrixStats to use the more commonly used
> na.rm=FALSE (no promises though).
>
> /Henrik
>
> >>
> >> -- output of sessionInfo():
> >>
> >> R version 3.0.2 (2013-09-25)
> >> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> >>
> >> locale:
> >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >>
> >> attached base packages:
> >> [1] parallel stats graphics grDevices utils datasets
> methods base
> >>
> >> other attached packages:
> >> [1] pcaMethods_1.52.1 Rcpp_0.11.0 matrixStats_0.8.14
> Biobase_2.22.0 BiocGenerics_0.8.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] R.methodsS3_1.6.1 tools_3.0.2
> >>
> >> --
> >> Sent via the guest posting facility at bioconductor.org
> <http://bioconductor.org>.
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list