[BioC] Reading MAGE-ML cdf into bioconductor for limma in R v. 3.0.2
James W. MacDonald
jmacdon at uw.edu
Mon Mar 31 16:13:44 CEST 2014
Hi Ben,
> pro.fe.set<-ArrayExpress("E-GEOD-26533")
<snip>
> ls()
[1] "mapCdfName" "pro.fe.set"
> pro.fe.set
AffyBatch object
size of arrays=448x448 features (51 kb)
cdf=MD4-9313a520062 (??? affyids)
number of samples=39
Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain MD4-9313a520062
Library - package md49313a520062cdf not installed
Bioconductor - md49313a520062cdf not available
In addition: Warning message:
missing cdf environment! in show(AffyBatch)
<starts browser>
Googles MD4-9313a520062
Fourth hit is
http://lifesciencedb.jp/geo-e/?division=Unassigned&technology=GeneChip&order=manufacturer&action=ListPlatform
First line in table has link
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL5471
At bottom of said link is
*Supplementary file* *Size* *Download* *File type/resource*
GPL5471.cdf.gz 1.7 Mb (ftp)
<ftp://ftp.ncbi.nlm.nih.gov/geo/platforms/GPL5nnn/GPL5471/suppl/GPL5471%2Ecdf%2Egz>(http)
<http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz>
CDF
Copies http link
</closes browser>
>
download.file("http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz",
"tmp.gz")
trying URL
'http://www.ncbi.nlm.nih.gov/geo/download/?acc=GPL5471&format=file&file=GPL5471%2Ecdf%2Egz'
Content type 'application/octet-stream' length 1764017 bytes (1.7 Mb)
opened URL
downloaded 1.7 Mb
> library(makecdfenv)
> make.cdf.package("GPL5471.cdf.gz", "md49313a520062cdf", compress =
TRUE, species = "Some_bacterium")
## this may fail. In which case
gzip -d GPL5471.cdf.gz
and then
> make.cdf.package("GPL5471.cdf", "md49313a520062cdf", species =
"Some_bacterium")
> install.packages("md49313a520062cdf/", repos = NULL, type = "source")
> pro.fe.set
AffyBatch object
size of arrays=448x448 features (51 kb)
cdf=MD4-9313a520062 (9947 affyids)
number of samples=39
number of genes=9947
annotation=md49313a520062
notes=E-GEOD-26533
E-GEOD-26533
c("Organism", "treatment", "strain", "time", "", "", "", "",
"", "", "")
c("", "", "", "", "", "", "", "", "", "", "")
Best,
Jim
On 3/31/2014 3:43 AM, Ben Temperton [guest] wrote:
> Hi there,
>
> I am trying to load some microarray data from ArrayExpress into R for analysis with Limma:
>
> pro.fe.set<-ArrayExpress("E-GEOD-26533")
>
> However, the probe set needs to be installed first for this to work, and the probe set is in MAGEML format. Previously, I've only ever dealt with the makecdfenv package that uses .cdf files. I found a package called RMAGEML in bioconductor that looked like it would do the job, but it is not available with R v. 3.0.2.
>
> I was hoping you might have some insight into how best to approach this problem.
>
> Many thanks,
> Ben
>
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ArrayExpress_1.22.0 Biobase_2.22.0 BiocGenerics_0.8.0 R.utils_1.29.8 R.oo_1.18.0 R.methodsS3_1.6.1
>
> loaded via a namespace (and not attached):
> [1] affy_1.40.0 affyio_1.30.0 BiocInstaller_1.12.0 limma_3.18.13 preprocessCore_1.24.0
> [6] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list