[R] More efficient option to append()?

Daniel Nordlund djnordlund at frontier.com
Thu Aug 18 01:35:48 CEST 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Alex Ruiz Euler
> Sent: Wednesday, August 17, 2011 3:54 PM
> To: r-help at r-project.org
> Subject: [R] More efficient option to append()?
> 
> 
> Dear R community,
> 
> I have a 2 million by 2 matrix that looks like this:
> 
> x<-sample(1:15,2000000, replace=T)
> y<-sample(1:10*1000, 2000000, replace=T)
>       x     y
> [1,] 10  4000
> [2,]  3  1000
> [3,]  3  4000
> [4,]  8  6000
> [5,]  2  9000
> [6,]  3  8000
> [7,]  2 10000
> (...)
> 
> 
> The first column is a population expansion factor for the number in the
> second column (household income). I want to expand the second column
> with the first so that I end up with a vector beginning with 10
> observations of 4000, then 3 observations of 1000 and so on. In my mind
> the natural approach would be to create a NULL vector and append the
> expansions:
> 
> myvar<-NULL
> myvar<-append(myvar, replicate(x[1],y[1]), 1)
> 
> for (i in 2:length(x)) {
> myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
> }
> 
> to end with a vector of sum(x), which in my real database corresponds
> to 22 million observations.
> 
> This works fine --if I only run it for the first, say, 1000
> observations. If I try to perform this on all 2 million observations
> it takes long, way too long for this to be useful (I left it running
> 11 hours yesterday to no avail).
> 
> 
> I know R performs well with operations on relatively large vectors. Why
> is this so inefficient? And what would be the smart way to do this?
> 
> Thanks in advance.
> Alex
> 

Alex, 

does the following do what you want?

myvar <- rep(y,x)

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA



More information about the R-help mailing list