[R] Why daisy() in cluster library failed to exclude NA when computing dissimilarity
Martin Maechler
maechler at stat.math.ethz.ch
Mon Dec 9 11:36:04 CET 2013
>>>>> Gundala Viswanath <gundalav at gmail.com>
>>>>> on Sun, 8 Dec 2013 16:11:12 +0900 writes:
> Hi, According to daisy function from cluster
> documentation, it can compute dissimilarity when NA
> (missing) value(s) is present.
> http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html
> But why when I tried this code
> library(cluster)
> x <- c(1.115,NA,NA,0.971,NA)
> y <- c(NA,1.006,NA,NA,0.645)
> df <- as.data.frame(rbind(x,y))
> daisy(df,metric="gower")
> It gave this message:
> Dissimilarities :
> x
> y NA
> Metric : mixed ; Types = I, I, I, I, I
> Number of objects : 2
> Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
> I welcome other alternative than gower.
> I expect the dissimilarity output gives a non-NA value e.g. 0. What's
> the right way to do it?
Thank you, Gundala, for using a simple reproducible example.
Reading the documentation about Gower's distance a bit more,
you'd have found that it works by basically giving weight zero
to *pairs* of variable values where one of the two values is
missing.
In situations like yours, *all* pairs have at least one missing,
so there's no way to get a non-NA distance.
*AND* the documentation already contains this, at the very end
of the section 'Details' :
If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ‘NA’.
I.e., we have
> install.packages("fortunes")
> fortune("WTFM")
This is all documented in TFM. Those who WTFM don't want to have to WTFM again
on the mailing list. RTFM.
-- Barry Rowlingson
R-help (October 2003)
... which I now did in spite of Barry's excellent point
... let's say it's because of approaching Christmas !
Martin Maechler,
ETH Zurich
More information about the R-help
mailing list