[BioC] xps: hugene11 chip gives problems
cstrato
cstrato at aon.at
Fri Jan 11 21:05:46 CET 2013
Dear Philip,
Meanwhile I did another test and renamed my CEL-files to mimic your
names. This is what I get:
> celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL")
> data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr",
filedir=datdir, celdir=celdir, celfiles=celfiles)
Opening file
</Volumes/MitziData/CRAN/Workspaces/hugene11/na33/hugene11stv1.root> in
<READ> mode...
Creating new temporary file
</Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_HuBrPr_cel.root>...
Importing
</Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Brain_01_1.1.CEL>
as <Brain_01_1.1.cel>...
hybridization statistics:
1 cells with minimal intensity 17.5
1 cells with maximal intensity 22402.1
New dataset <DataSet> is added to Content...
Importing
</Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Prostate_01_1.1.CEL> as
<Prostate_01_1.1.cel>...
hybridization statistics:
2 cells with minimal intensity 14.5
1 cells with maximal intensity 23266.3
> for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE)))
+ cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i]))
Error: Tree set <> could not be found in file content
Error: Tree set <> could not be found in file content
As you can see I can now replicate your error.
The solution is simple, i.e. use parameter 'celnames'. Now the result is:
> celfiles <- c("Brain_01_1.1.CEL","Prostate_01_1.1.CEL")
> celnames <- c("Brain01","Prostate01")
> data.genome11 <- import.data(scheme.hugene11, "tmp_HuBrPr",
filedir=datdir, celdir=celdir, celfiles=celfiles, celnames=celnames)
Opening file
</Volumes/MitziData/CRAN/Workspaces/hugene11/na33/hugene11stv1.root> in
<READ> mode...
Creating new temporary file
</Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_HuBrPr_cel.root>...
Importing
</Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Brain_01_1.1.CEL>
as <Brain01.cel>...
hybridization statistics:
1 cells with minimal intensity 17.5
1 cells with maximal intensity 22402.1
New dataset <DataSet> is added to Content...
Importing
</Volumes/MitziData/CRAN/Workspaces/hugene11/celtest/Prostate_01_1.1.CEL> as
<Prostate01.cel>...
hybridization statistics:
2 cells with minimal intensity 14.5
1 cells with maximal intensity 23266.3
> for (i in 1:length(rawCELName(data.genome11, fullpath = FALSE)))
+ cat(sprintf("%s\n", rawCELName(data.genome11, fullpath = FALSE)[i]))
Brain_01_1.1.CEL
Prostate_01_1.1.CEL
As you can see, now everything works fine. The reason for introducing
parameter 'celnames' was from the beginning to allow alternative names
w/o the need to change the names of the original CEL-files, since often
CEL-files had names such as 'Breast_tissue;24/08/1999;batch-1,lot-2.1.CEL'.
I hope that using parameter 'celnames' does solve your problem.
Best regards,
Christian
On 1/10/13 9:10 PM, cstrato wrote:
> Dear Philip,
>
> I have just tried a subset of CEL-files from the Affymetrix
> "gene_1_1_st_ap_tissue_sample_data" for HuGene_1.1 array, but I cannot
> repeat the error you get. Here is my output for one CEL-file only:
>
> > library(xps)
>
> Welcome to xps version 1.19.1
> an R wrapper for XPS - eXpression Profiling System
> (c) Copyright 2001-2013 by Christian Stratowa
>
> > scheme <- root.scheme("./na33/hugene11stv1.root")
> > x.xps <- import.data(scheme, "tmp_x", celdir = "./cel", celfiles =
> "HumanBrain_1.CEL", verbose = TRUE)
> Opening file <./na33/hugene11stv1.root> in <READ> mode...
> Creating new temporary file
> </Volumes/MitziData/CRAN/Workspaces/hugene11/tmp_x_cel.root>...
> Importing <./cel/HumanBrain_1.CEL> as <HumanBrain_1.cel>...
> hybridization statistics:
> 1 cells with minimal intensity 17.5
> 1 cells with maximal intensity 22402.1
> New dataset <DataSet> is added to Content...
> > cat("The loaded .CEL-files are:\n");
> The loaded .CEL-files are:
> > for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
> HumanBrain_1.CEL
> >
> > sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] xps_1.19.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.0
> >
>
>
> As you see everything is ok. I did also run the triplicates of the Brain
> and Prostate samples and could do RMA w/o problems.
>
> Could you please try the following two options:
>
> 1, Could you try to use the CEL-files from the Affymetrix dataset to
> make sure that there is no problem with the CEL-files.
>
> 2, I see that you did create the ROOT scheme files in directory:
> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/")
>
> I must admit that I have never tried to store the scheme files in the
> package directory, since I have the feeling that this may cause
> troubles, especially when you update R and/or the xps package to a new
> version.
> Could you please try to save your file "hugene11stv1.root" in a
> different directory such as '/home/degroot/schemes' or better to create
> this file in this directory, and then try if you still get the problem.
>
> Best regards,
> Christian
>
>
> On 1/10/13 1:03 PM, Groot, Philip de wrote:
>> Hi Christian,
>>
>> I am trying to do an analysis using xps and the hugene11 chip. However,
>> I run into problems for which I need your help.
>>
>> I created a small test-script to demonstrate the problem:
>>
>> library(xps)
>>
>> scheme <-
>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root")
>>
>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles =
>> "G092_A05_01_1.1.CEL", verbose = TRUE)
>>
>> cat("The loaded .CEL-files are:\n");
>>
>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
>>
>> cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
>>
>> Upon execution, I get:
>>
>>> library(xps)
>>
>> Welcome to xps version 1.18.1
>>
>> an R wrapper for XPS - eXpression Profiling System
>>
>> (c) Copyright 2001-2012 by Christian Stratowa
>>
>>> scheme <-
>>> root.scheme("/local2/R-2.15.2/library/xps/schemes/hugene11stv1.root")
>>
>>> x.xps <- import.data(scheme, "tmp_x", celdir = ".", celfiles =
>>> "G092_A05_01_1.1.CEL", verbose = TRUE)
>>
>> Opening file </local2/R-2.15.2/library/xps/schemes/hugene11stv1.root> in
>> <READ> mode...
>>
>> Creating new temporary file
>> </mnt/geninf16/home/guests/pdegroot/dataanalysis/PHILIPG/tmp_x_cel.root>...
>>
>>
>> Importing <./G092_A05_01_1.1.CEL> as <G092_A05_01_1.1.cel>...
>>
>> hybridization statistics:
>>
>> 1 cells with minimal intensity 19
>>
>> 1 cells with maximal intensity 21364.4
>>
>> New dataset <DataSet> is added to Content...
>>
>>>
>>
>>> cat("The loaded .CEL-files are:\n");
>>
>> The loaded .CEL-files are:
>>
>>> for (i in 1:length(rawCELName(x.xps, fullpath = FALSE)))
>>
>> + cat(sprintf("%s\n", rawCELName(x.xps, fullpath = FALSE)[i]));
>>
>> Error: Tree set <> could not be found in file content
>>
>> Error: Tree set <> could not be found in file content
>>
>> NA
>>
>> The weird thing is: I only have this problem with the hugene11 chip. As
>> far as I can see, al other chips work properly (still na32 based).
>>
>> This effects all other steps, because there is no “content” to normalise
>> etc.
>>
>> I created the root-scheme as follows:
>>
>> scmdir <- paste(.path.package("xps"), "schemes/", sep = "/")
>>
>> scheme <- import.exon.scheme("hugene11stv1",filedir=scmdir,
>> layoutfile=paste(libdir, "HuGene-1_1-st-v1.r4.clf", sep="/"),
>> schemefile=paste(libdir,"HuGene-1_1-st-v1.r4.pgf", sep="/"),
>> probeset=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.probeset.csv",
>> sep="/"),
>> transcript=paste(anndir,"HuGene-1_1-st-v1.na33.1.hg19.transcript.csv",
>> sep="/"), add.mask = TRUE)
>>
>> (libdir and anndir are also defined off course).
>>
>> I even updated the na32 annotation to the latest Affymetrix version
>> (na33) the exclude a problem there. It does not fix the issue.
>>
>> Please note that I am running root version 5.32/04 as version 5.32/01 is
>> no longer available for download. Root works properly as far as I can
>> see.
>>
>> Do you have any clue where this problem originates from? Thank you!
>>
>> sessionInfo():
>>
>>> sessionInfo()
>>
>> R version 2.15.2 (2012-10-26)
>>
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>
>> [7] LC_PAPER=C LC_NAME=C
>>
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>>
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>>
>> [1] xps_1.18.1
>>
>> loaded via a namespace (and not attached):
>>
>> [1] tools_2.15.2
>>
>> Regards,
>>
>> *Dr. Philip de Groot
>> Bioinformatician / Microarray analysis expert*
>>
>> Wageningen University / TIFN
>> Netherlands Nutrigenomics Center (NNC)
>>
>> Nutrition, Metabolism & Genomics Group
>> Division of Human Nutrition
>> PO Box 8129, 6700 EV Wageningen
>> Visiting Address:
>>
>> "De Valk" ("Erfelijkheidsleer"),
>>
>> Building 304,
>> Verbindingsweg 4, 6703 HC Wageningen
>> Room: 0052a
>> T: 0317 485786
>> F: 0317 483342
>> E-mail: Philip.deGroot at wur.nl <mailto:Philip.deGroot at wur.nl>
>> I: http://humannutrition.wur.nl <http://humannutrition.wur.nl/>
>>
>> https://madmax.bioinformatics.nl
>>
>> http://www.nutrigenomicsconsortium.nl
>> <http://www.nutrigenomicsconsortium.nl/>
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list