[BioC] QCReport: specifying alt CDF (MoGene-1_0-st-v1)?

Wed Sep 22 19:55:06 CEST 2010

Hi Marc,

On 9/22/2010 1:23 PM, Marc Carlson wrote:
> Hi guys,
>
> Strangely enough I do not get this error from here in Seattle (on openSUSE).

Yeah, well that's because your skillz are vastly superior to mine ;-D

So if I wget the tarball from below and use install.packages, I get

 > install.packages("mogene10stv1cdf_2.6.2.tar.gz", repos=NULL)
Warning in install.packages("mogene10stv1cdf_2.6.2.tar.gz", repos = NULL) :
   argument 'lib' is missing: using '/home/jwm/R-2.11.0libs'
* installing *source* package 'mogene10stv1cdf' ...
** R
** data
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
   object 'annoStartupMessages' not found
ERROR: loading failed
* removing '/home/jwm/R-2.11.0libs/mogene10stv1cdf'
* restoring previous '/home/jwm/R-2.11.0libs/mogene10stv1cdf'
Warning message:
In install.packages("mogene10stv1cdf_2.6.2.tar.gz", repos = NULL) :
   installation of package 'mogene10stv1cdf_2.6.2.tar.gz' had non-zero 
exit status

However, if I point to the package I made here:

 > 
install.packages("~/BioC/archived_cdfs/build/mogene10stv1cdf_2.6.0.tar.gz", 
repos=NULL)
Warning in 
install.packages("~/BioC/archived_cdfs/build/mogene10stv1cdf_2.6.0.tar.gz", 
  :
   argument 'lib' is missing: using '/home/jwm/R-2.11.0libs'
* installing *source* package 'mogene10stv1cdf' ...
** R
** data
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded

* DONE (mogene10stv1cdf)

 > sessionInfo()
R version 2.11.0 beta (2010-04-11 r51685)
i686-pc-linux-gnu

locale:
  [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
  [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
  [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
  [7] LC_PAPER=en_US.iso885915       LC_NAME=C
  [9] LC_ADDRESS=C                   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.11.0

Best,

Jim

>
> The tarballs for all the annotation packages can always be found on our
> web site if biocLite should ever fail you (or if you just want to see
> what is in the source code):
>
> http://www.bioconductor.org/packages/release/data/annotation/
>
>
> I also see that Harry is not using the most recent version of
> AnnotationDbi for your version of R, so you might want to try and update
> that as well.
>
> biocLite("AnnotationDbi")
>
>
> Also, since I can't reproduce the behavior that you guys are
> experiencing, could you please let us know whatever you can about it?
>
>
>    Marc
>
>
>
> On 09/21/2010 02:08 PM, James W. MacDonald wrote:
>> Hi Harry,
>>
>> I get the same error you see, when I try to install on Linux. Weirdly
>> enough, if I install from source on my Windows box I don't have any
>> problems.
>>
>> And even weirder, I can install the source package on Linux if I use
>> my local copy (I make the cdf packages for BioC, so I have the
>> packages that I uploaded in April still sitting around).
>>
>> I would assume some corruption, if the package didn't install on
>> Windows, but it does.
>>
>> Well, anyway, attached is the package that will install for me. See if
>> it works for you.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 9/21/2010 3:55 PM, Harry Mangalam wrote:
>>> Hi Jim,
>>>
>>> Thanks for the rapid reply, info and pointers.
>>>
>>> I tried to take your advice and on a larger machine (due to malloc
>>> errors on the 1st - new sessionInfo() below) I can get a bit further
>>> but still can't convince arrayQualityMetrics() to take or recognize
>>> the appropriate cdf env.
>>>
>>>
>>> While I include the entire session below, the main problem seems to be
>>> that R will not conclude the installation of the CDF you referenced:
>>>
>>> biocLite("mogene10stv1cdf")
>>>
>>> either referenced separately or as part of the arrayQualityMetrics()
>>> dependency.  It gave the identical results on the machine I used
>>> before (w/ R 2.11.1) and on the larger 64b machine (w/ R 2.11.0).
>>>
>>> The entire session follows.
>>> (From a clean start on the machine whose sessionInfo() is included at
>>> beginning and end of the session.)
>>>
>>> $ module load R/2.11.0 # we use modules to keep things separate
>>> $ R
>>>> sessionInfo()
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>    [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] graphics  grDevices datasets  stats     utils     methods   base
>>>
>>> other attached packages:
>>> [1] Rmpi_0.5-8
>>>
>>>> library(affy)
>>>> # deleted all 'std' output, including only errors or warnings.
>>>
>>> #create an affybatch object  from the cel files.
>>>> ab<- ReadAffy(widget=TRUE)  # select all 8 wt cels (sal vs coc)
>>>
>>>> library("arrayQualityMetrics")
>>> # and run the code on all the wt cels
>>>> arrayQualityMetrics(expressionset = ab,outdir = "wt_sal_v_coc",force
>>> = TRUE,do.logtransform = TRUE)
>>> Loading required package: affyPLM
>>> Loading required package: gcrma
>>> Loading required package: preprocessCore
>>>
>>> Attaching package: 'affyPLM'
>>>
>>> The following object(s) are masked from 'package:stats':
>>>
>>>       resid, residuals, weights
>>>
>>>> arrayQualityMetrics(expressionset = ab,outdir = "wt_sal_v_coc",force
>>> = TRUE,do.logtransform = TRUE)
>>> The report will be written in directory 'wt_sal_v_coc'.
>>> trying URL
>>> 'http://bioconductor.org/packages/2.6/data/annotation/src/contrib/mogene10stv1cdf_2.6.2.tar.gz'
>>>
>>> Content type 'application/x-gzip' length 3126174 bytes (3.0 Mb)
>>> opened URL
>>> ==================================================
>>> downloaded 3.0 Mb
>>>
>>> * installing *source* package â€˜mogene10stv1cdfâ€™ ...
>>> ** R
>>> ** data
>>> ** preparing package for lazy loading
>>> ** help
>>> *** installing help indices
>>> ** building package indices ...
>>> ** testing if installed package can be loaded
>>> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
>>>     object 'annoStartupMessages' not found
>>> ERROR: loading failed
>>> * removing â€˜/apps/R/2.11.0/lib64/R/library/mogene10stv1cdfâ€™
>>>
>>> The downloaded packages are in
>>>           â€˜/tmp/Rtmpq2sQrq/downloaded_packagesâ€™
>>> Updating HTML index of packages in '.Library'
>>> Error in getCdfInfo(object) :
>>>     Could not obtain CDF environment, problems encountered:
>>> Specified environment does not contain MoGene-1_0-st-v1
>>> Library - package mogene10stv1cdf not installed
>>> Library - package mogene10stv1cdf not installed
>>> In addition: Warning message:
>>> In install.packages(cdfname, lib = lib, repos =
>>> Biobase:::biocReposList(),  :
>>>     installation of package 'mogene10stv1cdf' had non-zero exit status
>>>
>>> <<the above stanza repeated 2 more times, downloading and then failing
>>> to install the same pkg>>
>>>
>>> Is this a problem with matching case and intervening characters?
>>> (mogene10stv1 vs MoGene-1_0-st-v1) or something more fundamental.
>>>
>>> I tried this as a user and as root, to see if it was a permissions
>>> problem.  The results were identical.
>>>
>>> I also tried the installation of the CDF that came with the cel files.
>>> (MoGene-1_0-st-v1.r3.cdf), but while this apparently went to
>>> completion (as previously noted), it did not change anything.
>>>
>>> # at end of session, here is the sessionInfo()
>>>> sessionInfo()
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>    [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] tools     tcltk     graphics  grDevices datasets  stats     utils
>>> [8] methods   base
>>>
>>> other attached packages:
>>>    [1] arrayQualityMetrics_2.6.0 affyPLM_1.24.1
>>>    [3] preprocessCore_1.10.0     gcrma_2.20.0
>>>    [5] tkWidgets_1.26.0          DynDoc_1.26.0
>>>    [7] widgetTools_1.26.0        affy_1.26.1
>>>    [9] Biobase_2.8.0             Rmpi_0.5-8
>>>
>>> loaded via a namespace (and not attached):
>>>    [1] affyio_1.16.0        annotate_1.26.1      AnnotationDbi_1.10.0
>>>    [4] beadarray_1.16.0     Biostrings_2.16.9    DBI_0.2-5
>>>    [7] genefilter_1.30.0    grid_2.11.0          hwriter_1.2
>>> [10] IRanges_1.6.17       lattice_0.18-5       latticeExtra_0.6-11
>>> [13] limma_3.4.5          marray_1.26.0        RColorBrewer_1.0-2
>>> [16] RSQLite_0.8-4        simpleaffy_2.24.0    splines_2.11.0
>>> [19] stats4_2.11.0        survival_2.35-8      vsn_3.16.0
>>> [22] xtable_1.5-6
>>>
>>> Thanks for your consideration.
>>>
>>> harry
>>>
>>>
>>> On Tuesday 21 September 2010 06:49:38 James W. MacDonald wrote:
>>>> Hi Harry,
>>>>
>>>> On 9/20/2010 6:20 PM, Harry Mangalam wrote:
>>>>> Hi BioC
>>>>>
>>>>> (sessionInfo() at bottom)
>>>>>
>>>>> I'm trying to help a group here do some QC on their affy datasets
>>>>> derived from the mogene10stv1 array set.  This array is not in
>>>>> mainstream BioC support but I've created and installed the CDF
>>>>
>>>>> environment for that array:
>>>> This is not correct.
>>>>
>>>> biocLite("mogene10stv1cdf")
>>>>
>>>> Will get you the package you create below.
>>>>
>>>>>>     make.cdf.package("MoGene-1_0-st-v1.r3.cdf", species =
>>>>>>     "Mus_mus")
>>>>>
>>>>> (completes, and I've installed the generated CDF env)
>>>>>
>>>>> but when I try to run  QCReport on this dataset (even explicitly
>>>>
>>>>> specifying the mogene10stv1 CDF env), I get the errors:
>>>> In future, please mention the package you are using. I happen to
>>>> know that QCReport() is part of the affyQCReport package, but by
>>>> neglecting to include this bit of information you seriously
>>>> degrade your chances of an answer.
>>>>
>>>> Now on to the answer. ;-D
>>>>
>>>> You are not going to be very satisfied with affyQCReport for this
>>>> chip, as it uses the simpleaffy package for much of the quality
>>>> control output, a good portion of which is based on MAS5 calls.
>>>> Since the MoGene chip is a PM-only chip, you won't be able to
>>>> compute MAS5 calls, as they rely on the matching MM probes, which
>>>> don't exist. Hence the NA values below.
>>>>
>>>> I believe you will be better off using the arrayQualityMetrics
>>>> package, which is more general.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>>> QCReport(ReadAffy(widget=TRUE,cdfname="mogene10stv1cdf"))
>>>>>
>>>>> #   or
>>>>>
>>>>>> QCReport(ReadAffy(widget=TRUE,cdfname="mogene10stv1"))
>>>>>
>>>>> #   (get same error)
>>>>>
>>>>> Error: NAs in foreign function call (arg 1)
>>>>> In addition: Warning messages:
>>>>>
>>>>> 1: In data.row.names(row.names, rowsi, i) :
>>>>>      some row.names duplicated:
>>>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50
>>>>> ,51,52,53,54,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,9
>>>>> 4,95,96,97,98,99,102,103,104,108,109,110,111,114,119,120,121,122,
>>>>> 127,134,136,137,138,139,141,142,147,148,149,152,153,156,157,158,1
>>>>> 59,162,163,164,165,166,167,168,169,170,171,173,175,176,179,180,18
>>>>> 3,184,185,186,191,192,195,197,198,199,200,202,206,207,210,219,220
>>>>> ,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,
>>>>> 252,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,2
>>>>> 90,291,292,296,297,298,302,304,305,306,310,311,312,313,317,318,31
>>>>> 9,321,322,324,334,337,338,339,340,341,345,346,350,351,356,359,362
>>>>> ,364,366,367,370,371,373,376,378,382,383,384,385,386,387,388,389,
>>>>> 391,394,395,397,398,399,400,402,403,405,406,407,409,410,411,415,4
>>>>> 16,418,419,425,431,432,433,434,435,440,441,443,445,447,449,450,45
>>>>> 2,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494
>>>>> ,495,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51
>>>>> [... truncated]
>>>>>
>>>>> 2: In qc.affy(unnormalised, ...) :
>>>>>      CDF Environment name ' hgu95av2cdf ' does not match cdfname '
>>>>>
>>>>> mogene10stv1cdf '
>>>>>
>>>>> Error in plot(qc(object)) :
>>>>>      error in evaluating the argument 'x' in selecting a method for
>>>>>
>>>>> function 'plot'
>>>>>
>>>>>
>>>>> This: /Error: NAs in foreign function call (arg 1)/
>>>>>
>>>>>     seems to imply that:
>>>>> - there's an error in the '(arg 1)'  but which (arg 1)?
>>>>>
>>>>>      If this refers to the arg
>>>>>
>>>>> /ReadAffy(widget=TRUE,cdfname="mogene10stv1cdf")/
>>>>>
>>>>>      then that part of the command seems to complete fine and
>>>>>      returns an
>>>>>
>>>>> AffyBatch object as it should
>>>>>
>>>>>> str(rawdata)
>>>>>
>>>>> Formal class 'AffyBatch' [package "affy"] with 10 slots
>>>>>
>>>>>      ..@ cdfName          : chr "mogene10stv1cdf"
>>>>>      ..@ nrow             : int 1050
>>>>>      ..@ ncol             : int 1050
>>>>>
>>>>> /etc/
>>>>>
>>>>>
>>>>> - or I have NAs in the data, but doesn't point to where or
>>>>> whether I should address them.
>>>>> If this is the critical error, I'm guessing I have to choose a
>>>>> transform that removes or floor-shifts the NAs into a
>>>>> computational form?
>>>>>
>>>>> - the Warnings:
>>>>>
>>>>> 1: In data.row.names(row.names, rowsi, i) :
>>>>>      some row.names duplicated:
>>>>>      4,8,9,13,14,15,16,24,25,26,27,28,29,<almost every
>>>>>      intervening # omitted>
>>>>>      ,513,515,516,51 [... truncated]
>>>>>
>>>>> Would this be related to warning 2 below?
>>>>>
>>>>> 2: In qc.affy(unnormalised, ...) :
>>>>>      CDF Environment name ' hgu95av2cdf ' does not match cdfname '
>>>>>
>>>>> mogene10stv1cdf '
>>>>>
>>>>> but if so, what is the proper way to tell QCReport that I'm using
>>>>> a non-default CDF?
>>>>> the help section for QCReport doesn't describe any params for
>>>>> telling it that the CDF env is not 'hgu95av2cdf' and I've tried
>>>>> including that info in the ReadAffy() fn as noted:
>>>>>
>>>>> ie:
>>>>>> QCReport(ReadAffy(widget=TRUE,cdfname="mogene10stv1"))
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> sessionInfo()
>>>>>
>>>>> R version 2.11.1 (2010-05-31)
>>>>> i486-pc-linux-gnu
>>>>>
>>>>> locale:
>>>>>     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>     [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>     [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] tools     tcltk     stats     graphics  grDevices utils
>>>>> datasets
>>>>> [8] methods   base
>>>>>
>>>>> other attached packages:
>>>>>     [1] makecdfenv_1.26.0     tkWidgets_1.26.0      DynDoc_1.26.0
>>>>>     [4] widgetTools_1.26.0    hgu95av2cdf_2.6.0
>>>>>     affydata_1.11.10 [7] affyQCReport_1.26.0   lattice_0.19-11
>>>>>       RColorBrewer_1.0-2
>>>>>
>>>>> [10] affyPLM_1.24.1        preprocessCore_1.10.0 xtable_1.5-6
>>>>> [13] simpleaffy_2.24.0     gcrma_2.20.0
>>>>> genefilter_1.30.0 [16] mogene10stv1cdf_2.6.2 affy_1.26.1
>>>>>     Biobase_2.8.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>     [1] affyio_1.16.0        annotate_1.26.1
>>>>>     AnnotationDbi_1.10.2 [4] Biostrings_2.16.9    DBI_0.2-5
>>>>>         grid_2.11.1 [7] IRanges_1.6.17       RSQLite_0.9-2
>>>>>     splines_2.11.1
>>>>>
>>>>> [10] survival_2.35-8
>>>>>
>>>>>
>>>>> Thanks for your consideration.
>>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues