David L Carlson
dcarlson at tamu.edu
Tue Apr 14 21:53:37 CEST 2015
Try all.equal(df[1,3], df[2,3])
This relates to how decimal numbers are stored in computers. It is not an R only issue, but it is described in the R-FAQ:
>From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html
7.31 Why doesn't R think these numbers are equal?
The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example
R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16
The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will need to consider error propagation carefully.
For more information, see e.g. David Goldberg (1991), "What Every Computer Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys, 23/1, 5-48, also available via http://www.validlab.com/goldberg/paper.pdf.
To quote from "The Elements of Programming Style" by Kernighan and Plauger:
10.0 times 0.1 is hardly ever 1.0.
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram Chhatre
Sent: Tuesday, April 14, 2015 2:40 PM
To: r-help
Subject: [R] Extracting unique entries by a column
I have a data frame of dim 3x600. There are pairs of rows which have the
exact same value in column 3.
head(df)
POP1 POP2 ABSDIFF
L0005.01 0.98484848 0.688118812 0.2967297
L0005.03 0.01515152 0.311881188 0.2967297
L0008.02 0.97727273 0.004424779 0.9728479
L0008.04 0.02272727 0.995575221 0.9728479
L0012.03 0.98684211 0.004385965 0.9824561
L0012.01 0.01315789 0.995614035 0.9824561
I want to unique sort on df$ABSDIFF so that only one row per pair remains
in the subset.
>df_subset <- df[df(!duplicated(df$ABSDIFF), ]
This does not work. So I literally checked:
>identical(df[1,3], df[2,3])
FALSE
How is 0.2967297 different from 0.2967297? I am puzzled.
Thanks for any insight.
Vikram
