[R] UNIX diff function

Sarah Goslee sarah.goslee at gmail.com
Wed Jul 13 19:23:57 CEST 2011


Hi Dennis,

It still uses paste(), but this isn't so bad:
SET2[!(do.call(paste, SET2) %in% do.call(paste, SET1)),]

You could even turn it into a function. This one checks for matching
rows in the data frame with more rows, but you could take out the
conditional if you want order of arguments to be important.

rowdiff <- function(df1, df2) {
   if(nrow(df1) >= nrow(df2)) {
      df1[!(do.call(paste, df1) %in% do.call(paste, df2)),]
   } else {
      df2[!(do.call(paste, df2) %in% do.call(paste, df1)),]
   }
}

> rowdiff(SET1, SET2)
  LETTERS NUMBERS
5       E       5
> rowdiff(SET2, SET1)
  LETTERS NUMBERS
5       E       5
> rowdiff(SET1, SET1)
[1] LETTERS NUMBERS
<0 rows> (or 0-length row.names)


Sarah

On Wed, Jul 13, 2011 at 1:14 PM, Dennis Fisher <fisher at plessthan.com> wrote:
> Colleagues,
>
> (R: 2.13.0; OS X)
>
> I often receive sequential datasets in which there are new rows interposed between existing rows.  For example:
>        SET1 <- data.frame(list(LETTERS=LETTERS[c(1:4, 6:10)], NUMBERS=c(1:4, 6:10)))
>        SET2 <- data.frame(list(LETTERS=LETTERS[1:10], NUMBERS=1:10))
>
>> SET1
>  LETTERS NUMBERS
> 1       A       1
> 2       B       2
> 3       C       3
> 4       D       4
> 5       F       6
> 6       G       7
> 7       H       8
> 8       I       9
> 9       J      10
>
>> SET2
>   LETTERS NUMBERS
> 1        A       1
> 2        B       2
> 3        C       3
> 4        D       4
> 5        E       5
> 6        F       6
> 7        G       7
> 8        H       8
> 9        I       9
> 10       J      10
>
> As you can see, the row containing E and 5 was inserted into the second set.  The UNIX diff command identifies the differences quite readily.  Obviously, the R diff function does not do this.  However, one kluge that I use is to paste together all the entries in each row, then perform a setdiff on the two resulting vectors.  Assuming that no rows are duplicated (which would true in my data), my approach works but is it cumbersome.
>
> I suspect that someone on this board has thought of a more clever approach to this (or perhaps some function already exists).  Any help would be appreciated.
>
> Thanks.
>
> Dennis
>
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list