[R] How to locate the difference from two data frames
David Winsemius
dwinsemius at comcast.net
Thu Apr 8 22:20:00 CEST 2010
On Apr 8, 2010, at 4:03 PM, Jun Shen wrote:
> David,
>
> all.equal() only tells how many mismatches there are including
> missing values but it doesn't tell me the location of each mismatch.
Yes, I noticed that after further testing. I agree Charles' solution
is more informative and I wonder if it could be added to the
functionality of all.equal (which purports to tell the user where
objects differ)?
>
> For example, if I have one NA mismatch and three numerical mismatches,
>
> all.equal(a,b) gives
> [1] "Component 2: 'is.NA' value mismatch: 1 in current 0 in target"
> [2] "Component 3: 3 string mismatches"
> This only tells the missing value mismatch is in the second column
> (component) and 3 numerical mismatches in the third column. But no
> row information
>
> which(mapply(identical,unlist(a),unlist(b))==FALSE) gives
> TIME5 DV1 DV2 DV17
> 85 161 162 177
> It tells me exactly which columns and rows to have the mismatches.
> In this case is column "TIME" row 5 and column "DV" rows 1, 2 and
> 17. You can ignore the serial numbers that followed.
>
> Jun
>
> On Thu, Apr 8, 2010 at 1:58 PM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Apr 8, 2010, at 1:34 PM, Jun Shen wrote:
>
> David,
>
> Thanks for the suggestion. Now I have worked out a general solution.
>
> Assume "a" and "b" are two data frames with same dimensions
>
> 1. Call identical(a,b) to get an overall assessment. If you get a
> FALSE
> 2. Call which(mapply(identical,unlist(a),unlist(b))==FALSE), you
> will get a result like
> TIME5
> 85
> which means, the row 5 and the column with name "TIME" is different.
> This also works for missing values. Thanks for everyone.
>
> Looks that all.equal is already set up to provide such a service:
>
> > all.equal(df1,df2)
> [1] "Component 1: 'is.NA' value mismatch: 1 in current 0 in target"
>
> I was under the misimpression that all.equal was for approximate
> equality of numeric values but that only appears to be part of its
> design.
>
> --
> David.
>
>
>
> Jun Shen from Millipore
>
> On Thu, Apr 8, 2010 at 9:08 AM, David Winsemius <dwinsemius at comcast.net
> > wrote:
>
> On Apr 8, 2010, at 9:47 AM, Jun Shen wrote:
>
> Dear David, Erik and Charles,
>
> Thank you for your input. Both mapply() and which() can do the job.
> Just one
> exception. If there is a missing value as NA in the data frame "a"
> and a
> data point (either numerical or character) in the corresponding
> position of
> "b", then mapply() only returns NA for that position rather than
> "FALSE",
> and which() cannot pick up that position either. Thanks again.
>
>
> You seem to have changed the programming challenge from
> identification to replicating identical(). If so then you can get
> closer with wrapping isTRUE(all() around the mapply("==" ,
> attributes( ...), ...) step, and wrap the "==" call in
> isTRUE(all(.))
>
> > isTRUE(all(mapply("==", df1, df2)) )
> [1] FALSE since all(c(NA, TRUE, TRUE)) == NA and isTRUE(NA) == FALSE
>
> --
> David.
>
>
>
>
> Jun
>
> On Wed, Apr 7, 2010 at 10:46 PM, Charles C. Berry <cberry at tajo.ucsd.edu
> >wrote:
>
> On Wed, 7 Apr 2010, Jun Shen wrote:
>
> Dear all,
>
> I understand identical (a,b) will tell me if a and b are exactly the
> same
> or
> not. But what if they are different, is there anyway to tell which
> element(s) are different? Thanks.
>
>
> which( a != b, arr.ind = TRUE)
>
> HTH,
>
> Chuck
>
>
> Jun
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list