[R] which(df$name=="A") takes ~1 second! (df is very large), but can it be speeded up?
Emmanuel Levy
emmanuel.levy at gmail.com
Wed Aug 13 01:35:22 CEST 2008
Dear All,
I have a large data frame ( 2700000 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame "df":
> col1=sample(c(0,1),10, rep=T)
> names = factor(c(rep("A",5),rep("B",5)))
> df = data.frame(names,col1)
> df
names col1
1 A 1
2 A 0
3 A 1
4 A 0
5 A 1
6 B 0
7 B 0
8 B 1
9 B 0
10 B 0
I would like to tranform it in the form:
> index = c("A","B")
> col1[[1]]=df$col1[which(df$name=="A")]
> col1[[2]]=df$col1[which(df$name=="B")]
My problem is that the command: *** which(df$name=="A") ***
takes about 1 second because df is so big.
I was thinking that a "level" could maybe be accessed instantly but I am not
sure about how to do it.
I would be very grateful for any advice that would allow me to speed this up.
Best wishes,
Emmanuel
More information about the R-help
mailing list