[R] Filling out a data frame row by row.... slow!
ilai
keren at math.montana.edu
Wed Feb 15 20:41:57 CET 2012
First, in R there is no need to declare the dimensions of your objects
before they are populated so couldn't you reduce some run time by not
going through the double data.frame step ?
> df<- data.frame()
> df
data frame with 0 columns and 0 rows
> for(i in 1:100) for(j in 1:3) df[i,j]<- runif(1)
> str(df)
'data.frame': 100 obs. of 3 variables:
...
Second, about populating an environment ?assign might work better for you
> e<- new.env()
> system.time(for(i in 1:10000) e$a[i]<- rnorm(1,i))
user system elapsed
0.97 0.00 0.96
> rm(e)
> e<- new.env()
> system.time(for(i in 1:10000) assign('a',rnorm(1,i),env=e))
user system elapsed
0.17 0.00 0.17
Third, how are you reading in the file? and what does that mean "not
knowing in advance..." ? Bill's suggestion to not populate the
data.frame line by line is probably the "real" solution to your
problem, as otherwise it's a little like kicking a turtle to make it
go faster...try to find a rabbit instead.
Posting a minimal example of your file format would have really
helped. Often using ?scan to read the whole (or big chunks of the)
file into R, followed by a customized formatting function that
utilizes ?grep and ?strsplit to reconstruct the data you want in
columns, solves the NEED to populate a data frame line by line.
Hope this helps
Elai
> One complication is I don't know the names of the columns I'm assigning to
> before I read them off the file. And crazily, if I change this:
> data$x[i] <- i + 0.1
>
> where data is an environment and x a primitive vector, to use a computed
> name instead:
>
> data[[colname]][i] <- i + 0.1
>
> Then I get back to way-superlinear performance. Eventually I found I could
> work around it like:
>
> eval(substitute(var[ix] <- data,
> list(var=as.name(colname), ix=i, data = i+0.1)),
> envir = data)
>
> but... as workarounds go that seems to be on the crazy nuts end of the
> scale. Why does [[]] impose a performance penalty?
>
> Peter
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list