[R] merging dataframes with an unequal number of variables

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 7 15:41:36 CEST 2009


See ?rbind.fill in the plyr package.

On Wed, Oct 7, 2009 at 9:32 AM, christiaan pauw <cjpauw at gmail.com> wrote:
> Hallo Everyone
> I have the kind of problem that one should never have because one must
> always plan well and communicate with your team. But now I haven't so here
> is my problem.
>
> I have data coming in on a daily basis from surveys in 10 towns. The
> questionnaire has 62 variables but some of the regions have used older
> versions of the questionnaire that have a few variables less. I want to
> combine everything  a single dataframe on a daily basis. The problem is now
> that i cannot rbind() the data because the number of variables do not
> correspond. I have found that i can first subset all datasets to keep just
> the variables that they all have in common but that is very unsatisfactory.
> What I want to do is to use a complete list of variable names and look at
> each data frame and create variable names where they are missing and fill it
> with NAs. At least then I can merge the data and use the data that I have
>
> short example
>
> # Make a data frame with 4 variables
>
> var1=c(1,2,3,4,5,6)
>
> var2=c("a","b","a","b","a","a")
>
> var3=c(1,NA,NA,2,3,NA)
>
> var4=c(100,200,300,100,200,300)
>
> df1=data.frame(cbind(var1,var2,var3,var4))
>
>
> # Data frame 2 and three has two of the 4 variables and 4 has eveything
>
> df2=df1[,c(1,2,4)]
>
> df3=df1[,c(2,3,4)]
>
> df4=df1
>
>
> # I wanted to do this but it produces an error because the number of
> variable differ
>
> df=data.frame(rbind(df1,df2,df3,df4))
>
>
> #I have figured out how to print the names of variable that do match the
> 'master' list (in this case df1):
> # example with df3
> names(df3[,na.omit(match(names(df1),names(df3)))])
>
> #What I need is the name of the variable that each specific data frame does
> NOT contain
> # Something like this, but this gives an error
>
> names(df1[-names(df3[,na.omit(match(names(df1),names(df3)))])])
>
> thanks in advance
>
> Christiaan
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list