[R] Match strings across two differently sized dataframes and copy corresponding row to dataframe

Chris Beeley chris.beeley at gmail.com
Thu Jun 30 15:35:47 CEST 2011


Hello-

Sorry, this is a bit of a noob question, but I can't seem to progress
it any further.

I have two dataframes which contain a series of strings which exactly
match. The problem is one has more rows than the other (more cases
have been added) and they have been sorted so that they are not in the
same order. The smaller dataframe, though, contains in another column
which has codes classifying the strings.

So, for every row of the larger dataframe, I want to look up the
string in the smaller dataframe, and then use that row number to copy
across the code for the string into the larger dataframe. Here's my
idea so far:

# comments is the smaller dataframe with the codes, mydata is the
larger dataframe to which I would like to copy it.

commvec=charmatch(comments$ImproveOne, mydata$Improve)  # this is the
match between the strings one way
datavec=charmatch(mydata$Improve, comments$ImproveOne) # this is the
match the other way

mydata$ImproveCat1=NA # produce a variable to hold the copied codes

mydata$ImproveCat1[datavec[!is.na(datavec)]]=
comments$ImproveCat[commvec[!is.na(commvec)]] # for all the non
missing row numbers identified in the larger dataframe-
# copy the corresponding code from the smaller dataframe (which lives
in comments$ImproveCat

However, the last command doesn't work because the variables are not
the same length. They nearly are though, not sure if that's
coincidence or shows I'm close

length(mydata$ImproveCat1[datavec[!is.na(datavec)]]) # yields 1567

length(comments$ImproveCat[commvec[!is.na(commvec)]]) # yields 1512

I'm sorry, I did try to construct an example dataframe, but ironically
I can't make that work either! Sorry!

Any help gratefully received.

Many thanks!

Chris Beeley
Institute of Mental Health, UK



More information about the R-help mailing list