[R] problem merging two data sets ( one with a header and one without)

Don MacQueen macq at llnl.gov
Thu Aug 21 22:35:33 CEST 2008


merge() has by.x and by.y arguments. If you use them, you can merge 
data frames that have different column names. You can specify columns 
by name or by number. This is mentioned in the help for merge.

Try

    merge(Data1, Data2, by.x=1, by.y=2)

which will keep all of the columns in Data2, or

    merge(Data1, Data2[ ,c(2,5:30)] ,  by=1 )

if you must remove columns 1, 4, and 5 from Data2.


Alternately, since merge() works on common variable names, all you 
have to do is make sure that the single column you want to use for 
the merge has the same name in both of them, and that it is the only 
column with the same name in both. Thus, another way to do the merge 
would be

   names(Data2)[2] <- 'V1'
   merge( Data1, Data2)

or

   names(Data2)[2] <- 'V1'
   merge( Data1, Data2[ , c(2,5:30)] )


While I'm at it, using cbind() is unnecessary. You can replace

   P1<-cbind(Data2[,2])
   P2<-cbind(Data2[,5:30])
   FinalData<-cbind(P1,P2)

with

    FinalData <- Data2[, c(2,5:30)]

But even more unnecessary is the cbind() in

   P1<-cbind(Data2[,2])

all that is needed is

   P1<- Data2[,2]


I personally think you're better off if you do not change the names 
of FinalData, but if you do, it's easier this way:

    names(FinalData) <- paste('V',1:27,sep='')

Or more generally

    names(FinalData) <- paste('V', seq(ncol(FinalData)), sep='')


By the way, although your text file data2.txt does not have a header, 
your dataframe Data2 does have a "header". That is, it has column 
names V1, V2, and so on.

-Don

At 7:59 AM -0700 8/21/08, kayj wrote:
>I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does
>not.  I would like to merge the two data sets after removing some columns
>from data2 .
>
>I am having a problem merging so I had to write and read final data and
>specify the "header=F" so the merge can be done by"V1". Is there a way to
>avoid this step. The problem is when I do cbind the FinalData has different
>column names
>
>
>
>Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F)
>
>Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F)
>
>P1<-cbind(Data2[,2])
>P2<-cbind(Data2[,5:30])
>FinalData<-cbind(P1,P2)
>write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F,
>row.names=F)
>
>Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F)
>m<-merge(Data1,Data3, by="V1")
>
>
>--
>View this message in context: http:// www. 
>nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>R-help at r-project.org mailing list
>https:// stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov



More information about the R-help mailing list