[R] Odd Behavior Out of setdiff(...) - addition of duplicate entries is not identified

Jason Rupert jasonkrupert at yahoo.com
Fri May 29 23:58:50 CEST 2009


Jay, 


Thanks much for the reply.    I think you are right about the prob. Unfortunately, I was not able to find the old emails I had discussing the use of the more powerful setdiff that essentially inherits from the base class R setdiff functionality but extends that functionality by now working with data.frames instead of just a simple array of values.  Love this functionality.   

However, for the following example, 
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"))
Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"), Price = c("Low"))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
[1] HouseSize    LandLocation Price       
<0 rows> (or 0-length row.names)
> setdiff(Test2_DF, Test1_DF)
[1] HouseSize    LandLocation Price       
<0 rows> (or 0-length row.names)

I was hoping for this example one of the setdiff's would have returned essentially Test1_DF, since it is duplicated and that is what is different between the two dataframes.  

So, I guess I am trying to figure out a way to truely diff the dataframes, i.e. determine when two data.frames are different from one another and then receive the output of the results.  

Does this capability exist in a function within a current R package or does it exist within a typically used pattern to create this functionality?  

Thanks again for any feedback you can provide. 
 

Also, I tried to determine my Session Info and the packages I have loaded, but I received the following:
> sessionInfo()
Error in x$Priority : $ operator is invalid for atomic vectors
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'prob' is missing or broken
2: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'ggplot2' is missing or broken
3: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'reshape' is missing or broken
4: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'RColorBrewer' is missing or broken
5: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'proto' is missing or broken
6: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'plyr' is missing or broken
7: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'nortest' is missing or broken
8: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'fBasics' is missing or broken
9: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'timeSeries' is missing or broken
10: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'timeDate' is missing or broken
11: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'vcd' is missing or broken
12: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer",  ... :
  DESCRIPTION file of package 'colorspace' is missing or broken


However, I typically load the following ones:
library(colorspace, lib.loc=RLibraryPathLocation)
library(vcd, lib.loc=RLibraryPathLocation)
library(timeDate, lib.loc=RLibraryPathLocation)
library(timeSeries, lib.loc=RLibraryPathLocation)
library(fBasics, lib.loc=RLibraryPathLocation)
library(nortest, lib.loc=RLibraryPathLocation)
library(plyr, lib.loc=RLibraryPathLocation)
library(proto, lib.loc=RLibraryPathLocation)
library(RColorBrewer, lib.loc=RLibraryPathLocation)
library(reshape, lib.loc=RLibraryPathLocation)
library(ggplot2, lib.loc=RLibraryPathLocation)
library(prob, lib.loc=RLibraryPathLocation)


--- On Fri, 5/29/09, G. Jay Kerns <gkerns at ysu.edu> wrote:

> From: G. Jay Kerns <gkerns at ysu.edu>
> Subject: Re: [R] Odd Behavior Out of setdiff(...) - addition of duplicate  entries is not identified
> To: "Jason Rupert" <jasonkrupert at yahoo.com>
> Cc: R-help at r-project.org
> Date: Friday, May 29, 2009, 3:21 PM
> Dear Jason,
> 
> On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrupert at yahoo.com>
> wrote:
> >
> > I think I am using the improved version of
> setdiff(...) that handles data.frames, so I think some odd
> behavior was expected but this one is escaping me.
> >
> > It appears that the the addition of duplicate entries
> is not caught by the setdiff(...).  Is this expected
> behavior?
> 
> [snip]
> 
> > Thanks in advance for any feedback.
> >
> > Test1_DF<-data.frame(HouseSize=c(1:100))
> > Test2_DF<-rbind(Test1_DF, Test1_DF)
> > setdiff(Test1_DF, Test2_DF)
> > integer(0)
> > setdiff(Test2_DF, Test1_DF)
> > integer(0)
> >
> > However,
> > Test3_DF<-data.frame(HouseSize=c(1:25))
> > setdiff(Test1_DF, Test3_DF)
> >  [1]  26  27  28  29  30  31  32  33  34
>  35  36  37  38  39  40  41
> > [17]  42  43  44  45  46  47  48  49  50  51
>  52  53  54  55  56  57
> > [33]  58  59  60  61  62  63  64  65  66  67
>  68  69  70  71  72  73
> > [49]  74  75  76  77  78  79  80  81  82  83
>  84  85  86  87  88  89
> > [65]  90  91  92  93  94  95  96  97  98  99
> 100
> >
> > setdiff(Test3_DF, Test1_DF)
> > integer(0)
> 
> 
> You didn't explicitly say which "improved version" of
> setdiff() that
> you are using, so I can only presume that you are using
> the
> setdiff.data.frame in the prob package.
> 
> The behaviour you are observing is expected and matches
> the
> base:::setdiff behaviour in the case of vectors;  cf.
> 
> x1 <- c(1:100)
> x2 <- c(x1,x1)
> 
> setdiff(x1, x2)  # integer(0)
> setdiff(x2, x1)  # integer(0)
> 
> x3 <- c(1:25)
> setdiff(x1, x3)  # 26:100
> setdiff(x3, x1)  # integer(0)
> 
> 
> >
> > If so, is there another method or approach that should
> be used to identify duplicate row entries between two
> different data frames?
> >
> 
> The R-help archives are chock full of every possible
> variant of
> questions (and answers) about this, and you haven't said
> _exactly_
> what you are looking for. In the absence of an already
> posted
> solution, please specify exactly what you want and I'll
> wager an R
> Ninja could dispatch it in moments.
> 
> Regards,
> Jay
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ***************************************************
> G. Jay Kerns, Ph.D.
> Associate Professor
> Department of Mathematics & Statistics
> Youngstown State University
> Youngstown, OH 44555-0002 USA
> Office: 1035 Cushwa Hall
> Phone: (330) 941-3310 Office (voice mail)
> -3302 Department
> -3170 FAX
> E-mail: gkerns at ysu.edu
> http://www.cc.ysu.edu/~gjkerns/
> 







More information about the R-help mailing list