[R] pam() with more general dissimilarity / distance
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Apr 8 13:55:15 CEST 2022
I was asked in private, but reply in public,
so others can also find this answer in the future:
On Fri, Apr 8, 2022 at 1:11 PM ..... wrote :
> Hello
> dear Dr. Maechler
> I have a question about "pam" function in the cluster package. In this
> function, we choose one of the euclidean or manhattan distances to
> calculate dissimilarity but in the mixed typed data sets the true index may
> be jaccard or other indicators.
> How can we allocate the "true" metric for each variable?
> Best regards
>
yes, you can use pam() use in two ways; see this part of the help page :
Arguments:
x: data matrix or data frame, or dissimilarity matrix or object,
depending on the value of the ‘diss’ argument.
In case of a matrix or data frame, each row corresponds to an
observation, and each column corresponds to a variable. All
variables must be numeric. Missing values (NAs) _are_
allowed-as long as every pair of observations has at least
one case not missing.
In case of a dissimilarity matrix, ‘x’ is typically the
output of daisy or dist. Also a vector of length
n*(n-1)/2 is allowed (where n is the number of observations),
and will be interpreted in the same way as the output of the
above-mentioned functions. Missing values (NAs) are _not_
allowed.
So, you can first use dx <- daisy(x, ...) and use the correct
distance between your observational units,
After that you can use the computed distance / dissimilarity matrix
(the `dx`) in you call to pam():
px <- pam(dx, k=., ....)
I hope this helps you.
With best regards,
Martin
--
Martin Maechler
ETH Zurich
More information about the R-help
mailing list