[R] simprof test using jaccard distance
Bill.Venables at csiro.au
Bill.Venables at csiro.au
Wed May 18 01:56:01 CEST 2011
The documentation for simprof says, with respect to the method.distance argument, "This value can also be any function which returns a "dist" object."
So you should be able to use the Jaccard index by setting up your own function to compute it. e.g.
Jaccard <- function(X) vegan::vegdist(X, method = "jaccard")
tst <- simprof(.... , method.distance = Jaccard, .....)
by rights, ought to do the job.
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Maia.Berman at csiro.au
Sent: Tuesday, 17 May 2011 6:03 PM
To: r-help at r-project.org
Subject: [ExternalEmail] [R] simprof test using jaccard distance
Dear All,
I would like to use the simprof function (clustsig package) but the available distances do not include Jaccard distance, which is the most appropriate for pres/abs community data. Here is the core of the function:
> simprof
function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average",
method.distance = "euclidean", method.transform = "identity",
alpha = 0.05, sample.orientation = "row", const = 0, silent = TRUE,
increment = 100)
{
if (!is.matrix(data))
data <- as.matrix(data)
rawdata <- data
if (sample.orientation == "column")
rawdata <- t(rawdata)
if (is.function(method.distance))
rawdata.dist <- method.distance(rawdata)
else if (method.distance == "braycurtis") {
rawdata.dist <- braycurtis(rawdata, const)
if (!is.null(rownames(rawdata)))
attr(rawdata.dist, "Labels") <- rownames(rawdata)
}
else rawdata.dist <- dist(rawdata, method = method.distance)
if (!method.transform == "identity")
rawdata <- trans(rawdata, method.transform)
hclust.results <- hclust(rawdata.dist, method = method.cluster)
pMatrix <- cbind(hclust.results$merge, matrix(data = NA,
nrow = nrow(hclust.results$merge), ncol = 1))
simprof.results <- simprof.body(rawdata = rawdata, num.expected = num.expected,
num.simulated = num.simulated, method.cluster = method.cluster,
method.distance = method.distance, originaldata = rawdata,
alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge),
pMatrix = pMatrix, const = const, silent = silent, increment = increment)
results <- list()
results[["numgroups"]] <- length(simprof.results$samples)
results[["pval"]] <- simprof.results$pval
results[["hclust"]] <- hclust.results
if (!is.null(rownames(rawdata))) {
for (i in 1:length(simprof.results$samples)) {
for (j in 1:length(simprof.results$samples[[i]])) {
simprof.results$samples[[i]][j] <- rownames(rawdata)[as.numeric(simprof.results$samples[[i]][j])]
}
}
}
results[["significantclusters"]] <- simprof.results$samples
return(results)
}
<environment: namespace:clustsig>
I tried to trick the function by bypassing the input of a raw community matrix (sites x species) by inputing instead a distance matrix already calculated with vegdist (vegan package) (Jaccard distance is available in vegdist, but not in the function dist used in simprof...)
I therefore modified the function as follow:
simprof2=function (data, num.expected = 1000, num.simulated = 999, method.cluster = "average",
alpha = 0.05, sample.orientation = "row", const = 0, silent = TRUE,
increment = 100)
{ hclust.results <- hclust(data, method = method.cluster)
pMatrix <- cbind(hclust.results$merge, matrix(data = NA,
nrow = nrow(hclust.results$merge), ncol = 1))
simprof.results <- simprof.body(data = data, num.expected = num.expected,
num.simulated = num.simulated, method.cluster = method.cluster,originaldata = data,
alpha = alpha, clust.order = hclust.results$merge, startrow = nrow(hclust.results$merge),
pMatrix = pMatrix, const = const, silent = silent, increment = increment)
results <- list()
results[["numgroups"]] <- length(simprof.results$samples)
results[["pval"]] <- simprof.results$pval
results[["hclust"]] <- hclust.results
if (!is.null(rownames(data))) {
for (i in 1:length(simprof.results$samples)) {
for (j in 1:length(simprof.results$samples[[i]])) {
simprof.results$samples[[i]][j] <- rownames(data)[as.numeric(simprof.results$samples[[i]][j])]
}
}
}
results[["significantclusters"]] <- simprof.results$samples
return(results)
}
But upon running it on my vegdist object, I get the following error:
> simprof2(G3_jaccard)
Error in simprof2(G3_jaccard) : could not find function "simprof.body"
I guess this function is part of the initial environment in which simprof runs, but how do I access it in order to call it when I run my own function outside of the initial environment?
Another possible but maybe more complex way would be to define the jaccard distance (as J= 2B/(1+B) B beeing BrayCurtis distance already used in the initial function)
Many thanks in advance for your help!
--
Maïa Berman, MSc
PhD candidate
[[alternative HTML version deleted]]
More information about the R-help
mailing list