[R] efficiency in merging two data frames

Guojun Zhu shmilylemon at yahoo.com
Mon May 1 09:34:54 CEST 2006


I have two data sets about lots of companies' stock
and fiscal data.  One is monthly data with about
144,000 lines, and the other is quaterly with about
56,000.  Each data set takes different company code. 
I need to merge these two together.  I read both ask
cvs.  And the other file with corresponding firm code.
 Now I have three data sets. return$PERMNO,
account$GVKEY.  id is the data frames of the
corresponding relation and has both id$PERMNO and
id$GVKEY.  Also, I need to convert the return's month
into quarter and finally merge two data frames(return
and account).  I end up write a short program for
this, but it runs very slow.  15+ minutes.  Is there
quick way to do it.  Here is my original codes.



id$fy=rep(0,length(id$PERMNO))
for (i in 1:length(id$PERMNO))

id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]]

return$GVKEY=rep(0,length(return$PERMNO))
return$fyy=rep(0,length(return$PERMNO))
return$fyq=rep(0,length(return$PERMNO))
for (i in i:length(return$PERMNO)) {
    temp<-id$PERMNO==return$PERMNO[[i]];
    tempmon<-id$fy[temp][[1]];
    if (return$month[[i]]<-tempmon) {
	return$fyy[[i]]<-return$year[[i]];
	return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3;
	}
      else{
	return$fyy[[i]]<-return$year[[i]]+1;
	return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3;
	}
    return$GVKEY[[i]]<-id$GVKEY[temp][[1]];
}
   
returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("GVKEY","fyy","fyq"))




More information about the R-help mailing list