[BioC] rhdf5, dataframes, and variable length strings
Bernd Fischer
bernd.fischer at embl.de
Wed Nov 13 13:21:38 CET 2013
Dear John!
Thank you very much for reporting this bug. I can reproduce it on my computer,
but it will need some time to fix it. I will let you know, once it is fixed.
Best,
Bernd
On 28.10.2013, at 22:14, John at embl-heidelberg.de wrote:
>
> Hi all.
>
> I am working with large data frames in R that contain a mix of numbers and variable-length strings. I've tried using the rhdf5 package to write and then read these and I haven't been able to figure out how to correctly use the package. I'll include a toy data frame that causes R to segfault, at least on my machine. I would greatly appreciate either some pointers about what I'm doing wrong or another way to store my data.
>
> rndString <- function(n=1){rndString <- c(1:n);for(i in 1:n){rndString[i] <- paste(sample(c(0:9,letters,LETTERS),sample(c(3:20),1),replace=TRUE),collapse="")};return(rndString)}
> library(rhdf5)
> n <- 1000000
> d <- data.frame(id=seq(n),name=rndString(n),val=rnorm(n),stringsAsFactors=FALSE)
> h5createFile("test.h5")
> h5write(d,file="test.h5",name="d")
> dd <- h5read("test.h5",name="d")
>
> John Estrada
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rhdf5_2.6.0
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list