[R] UNIX diff function
Dennis Fisher
fisher at plessthan.com
Wed Jul 13 19:14:57 CEST 2011
Colleagues,
(R: 2.13.0; OS X)
I often receive sequential datasets in which there are new rows interposed between existing rows. For example:
SET1 <- data.frame(list(LETTERS=LETTERS[c(1:4, 6:10)], NUMBERS=c(1:4, 6:10)))
SET2 <- data.frame(list(LETTERS=LETTERS[1:10], NUMBERS=1:10))
> SET1
LETTERS NUMBERS
1 A 1
2 B 2
3 C 3
4 D 4
5 F 6
6 G 7
7 H 8
8 I 9
9 J 10
> SET2
LETTERS NUMBERS
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
7 G 7
8 H 8
9 I 9
10 J 10
As you can see, the row containing E and 5 was inserted into the second set. The UNIX diff command identifies the differences quite readily. Obviously, the R diff function does not do this. However, one kluge that I use is to paste together all the entries in each row, then perform a setdiff on the two resulting vectors. Assuming that no rows are duplicated (which would true in my data), my approach works but is it cumbersome.
I suspect that someone on this board has thought of a more clever approach to this (or perhaps some function already exists). Any help would be appreciated.
Thanks.
Dennis
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com
More information about the R-help
mailing list