[R] Loops and dataframes
Liaw, Andy
andy_liaw at merck.com
Fri Feb 25 12:28:44 CET 2005
You are discovering part of the overhead of using a data frame. The way you
specify the subset of data frame to replace matters somewhat:
> st <- rep(1,1e4)
> ed <- rep(2,1e4)
> df <- data.frame(start=st, end=ed)
> system.time(for (i in 1:dim(df)[1]) df[i,1] <- df[i,2], gcFirst=TRUE)
[1] 35.96 0.10 36.37 NA NA
> df <- data.frame(start=st, end=ed)
> system.time(for (i in 1:dim(df)[1]) df[[1]][i] <- df[[2]][i],
gcFirst=TRUE)
[1] 22.63 0.17 22.88 NA NA
> df <- data.frame(start=st, end=ed)
> system.time(for (i in 1:dim(df)[1]) df$start[i] <- df$end[i],
gcFirst=TRUE)
[1] 19.29 0.13 19.46 NA NA
If you have all numeric data, you might as well use a matrix instead of data
frame:
> m <- cbind(start=st, end=ed)
> str(m)
num [1:10000, 1:2] 2 2 2 2 2 2 2 2 2 2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "start" "end"
> system.time(for (i in 1:nrow(df)) m[i,1] <- m[i,2], gcFirst=TRUE)
[1] 0.06 0.00 0.08 NA NA
Andy
> From: Firas Swidan
>
> Hi,
> I am experiencing a long delay when using dataframes inside
> loops and was
> wordering if this is a bug or not.
> Example code:
>
> > st <- rep(1,100000)
> > ed <- rep(2,100000)
> > for(i in 1:length(st)) st[i] <- ed[i] # works fine
> > df <- data.frame(start=st,end=ed)
> > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever
>
> R: R 2.0.0 (2004-10-04)
> OS: Linux, Fedora Core 2
> kernel: 2.6.10-1.14_FC2
> cpu: AMD Athlon XP 1600.
> mem: 500MB.
>
> The example above is only to illustrate the problem. I need
> loops to apply
> some functions on pairs (not necessarily successive) of rows in a
> dataframe.
>
> Thankful for any advices,
> Firas.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list