[BioC] rhdf5 and factors

Martin Morgan mtmorgan at fhcrc.org
Sun Jan 27 21:40:13 CET 2013


On 01/27/2013 09:42 AM, Bernd Fischer wrote:
> Dear Moritz!
>
> An easy solution for you would be to separately write the factor-values (the integers)
> and the levels:
>
>> h5write(as.integer(obj), file=file, name="objCODES")
>> h5write(levels(obj), file=file, name="objLEVELS")

I was thinking this would work

   f = factor("M", "F")
   h5createFile(fl <- tempfile())
   res = h5write(f, fl, write.attributes=TRUE, name="f")

but the last line fails ('no applicable method for 'h5writeDataset' applied to 
an object of class "factor"') so then tried

   res = h5write(unclass(f), fl, write.attributes=TRUE, name="f")

which doesn't fail but doesn't seem to work?

 > dput(h5read(fl, "f", read.attributes=TRUE))
structure(c(2L, 1L), .Dim = 2L)
 > dput(unclass(f))
structure(c(2L, 1L), .Label = c("F", "M"))

I initially went down this line thinking that since a factor (and many other R 
entities) are just basic types + attributes, it would be easy to support 
serializing a broad range of R data types (read/write.attributes=TRUE would be a 
better default if the objective was to provide a transparent way to use hdf5 as 
a storage back-end, which I think would be cool). But maybe there's not 
intention, getting back to the original poster's question, to support this kind 
of high-level functionality in this package? Or maybe there's scope for an 
elegant (because one just has to recurse through an R object to save it) 
additional package that extends rhdf5?

Martin


>
> Best,
>
> Bernd
>
>
>
> --
> Bernd Fischer
> EMBL Heidelberg
> Meyerhofstraße 1
> 69117 Heidelberg
> Tel: +49 [0] 6221 387-8131
> E-Mail: bernd.fischer at embl.de
> Homepage: http://www-huber.embl.de/users/befische/
>
>
>
>
>
>
> On 23.01.2013, at 16:05, Moritz Emanuel Beber <moritz.beber at gmail.com> wrote:
>
>> Dear all,
>>
>> I sent a message to Bernd Fischer the maintainer of rhdf5 directly but got no response from him. My qualm lies with the writing and re-reading of factor vectors using rhdf5. In the current release they are simply written as integers and upon reading the HDF5 files the levels are obviously forgotten.
>>
>> Of course, I could convert the factors to character vectors before writing but I wanted to ask whether there is a plan to implement better factor support or if it's feasible to contribute code to facilitate such support.
>>
>> TIA,
>> Moritz
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list