[R] problem merging two data sets ( one with a header and one without)
Don MacQueen
macq at llnl.gov
Thu Aug 21 22:35:33 CEST 2008
merge() has by.x and by.y arguments. If you use them, you can merge
data frames that have different column names. You can specify columns
by name or by number. This is mentioned in the help for merge.
Try
merge(Data1, Data2, by.x=1, by.y=2)
which will keep all of the columns in Data2, or
merge(Data1, Data2[ ,c(2,5:30)] , by=1 )
if you must remove columns 1, 4, and 5 from Data2.
Alternately, since merge() works on common variable names, all you
have to do is make sure that the single column you want to use for
the merge has the same name in both of them, and that it is the only
column with the same name in both. Thus, another way to do the merge
would be
names(Data2)[2] <- 'V1'
merge( Data1, Data2)
or
names(Data2)[2] <- 'V1'
merge( Data1, Data2[ , c(2,5:30)] )
While I'm at it, using cbind() is unnecessary. You can replace
P1<-cbind(Data2[,2])
P2<-cbind(Data2[,5:30])
FinalData<-cbind(P1,P2)
with
FinalData <- Data2[, c(2,5:30)]
But even more unnecessary is the cbind() in
P1<-cbind(Data2[,2])
all that is needed is
P1<- Data2[,2]
I personally think you're better off if you do not change the names
of FinalData, but if you do, it's easier this way:
names(FinalData) <- paste('V',1:27,sep='')
Or more generally
names(FinalData) <- paste('V', seq(ncol(FinalData)), sep='')
By the way, although your text file data2.txt does not have a header,
your dataframe Data2 does have a "header". That is, it has column
names V1, V2, and so on.
-Don
At 7:59 AM -0700 8/21/08, kayj wrote:
>I have two set of data, Data1 and Data2 . Data1 has a header and Data2 does
>not. I would like to merge the two data sets after removing some columns
>from data2 .
>
>I am having a problem merging so I had to write and read final data and
>specify the "header=F" so the merge can be done by"V1". Is there a way to
>avoid this step. The problem is when I do cbind the FinalData has different
>column names
>
>
>
>Data1<-read.table("data1.txt", sep='\t', header=F, stringsAsFactors=F)
>
>Data2<-read.table("data2.txt", sep='\t', header=T, stringsAsFactors=F)
>
>P1<-cbind(Data2[,2])
>P2<-cbind(Data2[,5:30])
>FinalData<-cbind(P1,P2)
>write.table(FinalData ,file="FinalData.txt", sep='\t', quote=F, col.names=F,
>row.names=F)
>
>Data3<-read.table("FinalData.txt", sep='\t', header=F, stringsAsFactors=F)
>m<-merge(Data1,Data3, by="V1")
>
>
>--
>View this message in context: http:// www.
>nabble.com/problem-merging-two-data-sets-%28-one-with-a-header-and-one-without%29-tp19090134p19090134.html
>Sent from the R help mailing list archive at Nabble.com.
>
>______________________________________________
>R-help at r-project.org mailing list
>https:// stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov
More information about the R-help
mailing list