[R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified
G. Jay Kerns
gkerns at ysu.edu
Fri May 29 22:21:45 CEST 2009
Dear Jason,
On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrupert at yahoo.com> wrote:
>
> I think I am using the improved version of setdiff(...) that handles data.frames, so I think some odd behavior was expected but this one is escaping me.
>
> It appears that the the addition of duplicate entries is not caught by the setdiff(...). Is this expected behavior?
[snip]
> Thanks in advance for any feedback.
>
> Test1_DF<-data.frame(HouseSize=c(1:100))
> Test2_DF<-rbind(Test1_DF, Test1_DF)
> setdiff(Test1_DF, Test2_DF)
> integer(0)
> setdiff(Test2_DF, Test1_DF)
> integer(0)
>
> However,
> Test3_DF<-data.frame(HouseSize=c(1:25))
> setdiff(Test1_DF, Test3_DF)
> [1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> [17] 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
> [33] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
> [49] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
> [65] 90 91 92 93 94 95 96 97 98 99 100
>
> setdiff(Test3_DF, Test1_DF)
> integer(0)
You didn't explicitly say which "improved version" of setdiff() that
you are using, so I can only presume that you are using the
setdiff.data.frame in the prob package.
The behaviour you are observing is expected and matches the
base:::setdiff behaviour in the case of vectors; cf.
x1 <- c(1:100)
x2 <- c(x1,x1)
setdiff(x1, x2) # integer(0)
setdiff(x2, x1) # integer(0)
x3 <- c(1:25)
setdiff(x1, x3) # 26:100
setdiff(x3, x1) # integer(0)
>
> If so, is there another method or approach that should be used to identify duplicate row entries between two different data frames?
>
The R-help archives are chock full of every possible variant of
questions (and answers) about this, and you haven't said _exactly_
what you are looking for. In the absence of an already posted
solution, please specify exactly what you want and I'll wager an R
Ninja could dispatch it in moments.
Regards,
Jay
***************************************************
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gkerns at ysu.edu
http://www.cc.ysu.edu/~gjkerns/
More information about the R-help
mailing list