[R] Trying to avoid the loop while merging two data frames
Dimitri Liakhovitski
dimitri.liakhovitski at gmail.com
Tue Dec 22 18:27:57 CET 2015
Hello!
I have a solution for my task that is based on a loop. However, it's
too slow for my real-life problem that is much larger in scope.
However, I cannot use merge. Any advice on how to do it faster?
Thanks a lot for any hint on how to speed it up!
# I have 'mydata' data frame:
set.seed(123)
mydata <- data.frame(myid = 1001:1100,
version = sample(1:20, 100, replace = T))
head(mydata)
table(mydata$version)
# I have 'myinfo' data frame that contains information for each 'version':
set.seed(12)
myinfo <- data.frame(version = sort(rep(1:20, 30)), a = rnorm(60), b =
rnorm(60),
c = rnorm(60), d = rnorm(60))
head(myinfo, 40)
### MY SOLUTION WITH A LOOP:
### Looping through each id of mydata and grabbing
### all columns from 'myinfo' for the corresponding 'version':
# 1. Creating placeholder list for the results:
result <- split(mydata[c("myid", "version")], f = list(mydata$myid))
length(result)
(result)[1:3]
# 2. Looping through each element of 'result':
for(i in 1:length(result)){
id <- result[[i]]$myid
result[[i]] <- myinfo[myinfo$version == result[[i]]$version, ]
result[[i]]$myid <- id
result[[i]] <- result[[i]][c(5, 1:4)]
}
result <- do.call(rbind, result)
head(result) # This is the desired result
--
Dimitri Liakhovitski
More information about the R-help
mailing list