[R] Yet another set of codes to optimize

Daren Tan daren76 at hotmail.com
Fri Dec 5 03:41:23 CET 2008

I have problems converting my dataset from long to wide format. Previous attempts using reshape package and aggregate function were unsuccessful as they took too long. Apparently, my simplified solution also lasted as long. 
My complete codes is given below. When sample.size = 10000, the execution takes about 20 seconds. But sample.size = 100000 seems to take eternity. My actual sample.size is 15000000 i.e. 15 million. 
sample.size <- 10000

m <- data.frame(Name=sample(1:100000, sample.size, T), Type=sample(1:1000, sample.size, T), Predictor=sample(LETTERS[1:10], sample.size, T))
res <- function(m) {
    m.12.unique <- unique(m[,1:2])
    m.12.unique <- m.12.unique[order(m.12.unique[,1], m.12.unique[,2]),]
    v1 <- paste(m.12.unique[,1], m.12.unique[,2], sep=".")
    v2 <- c(sort(unique(m[,3])))
    res <- matrix(0, nr=length(v1), nc=length(v2), dimnames=list(v1, v2))
    m.ids <- paste(m[,1], m[,2], sep=".")
    for(i in 1:nrow(m)) {
      x <- m.ids[i]
      y <- m[i,3]
      res[x, y] <- res[x, y] + 1
   res <- data.frame(m.12.unique[,1], m.12.unique[,2], res, row.names=NULL)
   colnames(res) <- c("Name", "Type", v2)
> sessionInfo()
R version 2.8.0 (2008-10-20) 
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

More information about the R-help mailing list