[R] long to wide on larger data set

Mon Jul 12 15:19:04 CEST 2010

Hi Jim,

Thanks for responding. Here is the info I should have included before.
I should be able to access 4 GB.

> str(myData)
'data.frame':   53860857 obs. of  4 variables:
 $ V1: chr  "200003" "200006" "200047" "200050" ...
 $ V2: chr  "cv0001" "cv0001" "cv0001" "cv0001" ...
 $ V3: chr  "A" "A" "A" "B" ...
 $ V4: chr  "B" "B" "A" "B" ...
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

On Mon, Jul 12, 2010 at 7:54 AM, jim holtman <jholtman at gmail.com> wrote:
> What is the configuration you are running on (OS, memory, etc.)?  What
> does your object consist of?  Is it numeric, factors, etc.?  Provide a
> 'str' of it.  If it is numeric, then the size of the object is
> probably about 1.8GB.  Doing the long to wide you will probably need
> at least that much additional memory to hold the copy, if not more.
> This would be impossible on a 32-bit version of R.
>
> On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>> I have a data set that has 4 columns and 53860858 rows. I was able to
>> read this into R with:
>>
>> cc <- rep("character",4)
>> myData <- read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",")
>>
>>
>> I need to reshape this data from long to wide. On a small data set the
>> following lines work. But on the real data set, it didn't finish even
>> when I took a sample of two (rows in new data). I didn't receive an
>> error. I just stopped it because it was taking too long. Any
>> suggestions for improvements? Thanks.
>>
>> # start example
>> # i have commented out the write.table statement below
>>
>> testData <- read.table(textConnection("rs9999853,cv0084,A,A
>> rs999986,cv0084,C,B
>>  rs9999883,cv0084,E,F
>>  rs9999853,cv0085,G,H
>>  rs999986,cv0085,I,J
>>  rs9999883,cv0085,K,L"),header=FALSE,sep=",")
>>  closeAllConnections()
>>
>> mysamples <- unique(testData$V2)
>>
>> for (one_ind in mysamples) {
>>   one_sample <- testData[testData$V2==one_ind,]
>>   mywide <- reshape(one_sample, timevar = "V1", idvar =
>> "V2",direction = "wide")
>> #   write.table(mywide,file
>> ="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
>> }
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>