[R] long to wide on larger data set
Juliet Hannah
juliet.hannah at gmail.com
Mon Jul 12 15:19:04 CEST 2010
Hi Jim,
Thanks for responding. Here is the info I should have included before.
I should be able to access 4 GB.
> str(myData)
'data.frame': 53860857 obs. of 4 variables:
$ V1: chr "200003" "200006" "200047" "200050" ...
$ V2: chr "cv0001" "cv0001" "cv0001" "cv0001" ...
$ V3: chr "A" "A" "A" "B" ...
$ V4: chr "B" "B" "A" "B" ...
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
On Mon, Jul 12, 2010 at 7:54 AM, jim holtman <jholtman at gmail.com> wrote:
> What is the configuration you are running on (OS, memory, etc.)? What
> does your object consist of? Is it numeric, factors, etc.? Provide a
> 'str' of it. If it is numeric, then the size of the object is
> probably about 1.8GB. Doing the long to wide you will probably need
> at least that much additional memory to hold the copy, if not more.
> This would be impossible on a 32-bit version of R.
>
> On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>> I have a data set that has 4 columns and 53860858 rows. I was able to
>> read this into R with:
>>
>> cc <- rep("character",4)
>> myData <- read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",")
>>
>>
>> I need to reshape this data from long to wide. On a small data set the
>> following lines work. But on the real data set, it didn't finish even
>> when I took a sample of two (rows in new data). I didn't receive an
>> error. I just stopped it because it was taking too long. Any
>> suggestions for improvements? Thanks.
>>
>> # start example
>> # i have commented out the write.table statement below
>>
>> testData <- read.table(textConnection("rs9999853,cv0084,A,A
>> rs999986,cv0084,C,B
>> rs9999883,cv0084,E,F
>> rs9999853,cv0085,G,H
>> rs999986,cv0085,I,J
>> rs9999883,cv0085,K,L"),header=FALSE,sep=",")
>> closeAllConnections()
>>
>> mysamples <- unique(testData$V2)
>>
>> for (one_ind in mysamples) {
>> one_sample <- testData[testData$V2==one_ind,]
>> mywide <- reshape(one_sample, timevar = "V1", idvar =
>> "V2",direction = "wide")
>> # write.table(mywide,file
>> ="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
>> }
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
More information about the R-help
mailing list