[R] Estimate region of highest probabilty density
Ort Christoph
Christoph.Ort at eawag.ch
Wed Jun 14 13:24:11 CEST 2006
Estimate region of highest probabilty density
Dear R-community
I have data consisting of x and y. To each pair (x,y) a z value (weight) is assigned. With kde2d I can estimate the densities on a regular grid and based on this make a contour plot (not considering the z-values). According to an earlier post in the list I adjusted the kde2d to kde2d.weighted (see code below) to estimate the densities of x and y considering their weights z. There's also a piece of code with artificial data.
Now my question: Does a function exist which calculates the value of the level corresponding to a certain percentage of the volume (i.e. above this level e.g. 95% of the total volume lie, the (parameter) region of highest probability density)?
And secondly: How is it in the case of n parameters, when I don't want to plot it anymore but estimate quantiles for each parameter considering also the weight z (to each set of parameters c(v,w,x,y) a weight z is assigned)?
Many thanks in advance, I am very grateful for any hint
Chris
# MASS::kde2d copied and modified
# ===============================
library(MASS)
kde2d.weighted <- function (x, y, w, h, n = n, lims = c(range(x), range(y))) {
nx <- length(x)
if (length(y) != nx)
stop("data vectors must be the same length")
gx <- seq(lims[1], lims[2], length = n) # gridpoints x
gy <- seq(lims[3], lims[4], length = n) # gridpoints y
if (missing(h))
h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
if (missing(w))
w <- numeric(nx)+1;
h <- h/4
ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
return(list(x = gx, y = gy, z = z))
}
# Generate artificial data
# ========================
x <- runif(20000,-5,5)
y <- runif(20000,-5,5)
z <- dnorm(x, mean=0, sd=1)*dnorm(y, mean=0, sd=1)
# plot data
# =========
library(Rcmdr)
scatter3d(x=x,z=y,y=z,surface=FALSE,xlab="x",ylab="z",zlab="y",bg.col="black")
temp0 <- kde2d(x=x, y=y, n = 100, lims=c(range(x),range(y)))
contour(x=temp0$x, y=temp0$y, z=temp0$z, xlab="x", ylab="y", main="z")
temp <- kde2d.weighted(x=x, y=y, w=z, n = 100, lims=c(range(x),range(y)))
contour(x=temp$x, y=temp$y, z=temp$z, xlab="x", ylab="y", main="z", col="red", add=TRUE)
°°°
Christoph Ort
Eawag - aquatic research
Environmental Engineering
Überlandstrasse 133
CH-8600 Dübendorf
phone: +41-44-823-5041
fax: +41-44-823-5389
cell: +41-79-218-9242
mailto:christoph.ort at eawag.ch
http://www.eawag.ch/~ortchris/
http://www.eawag.ch
°°°
Christoph Ort
Eawag - aquatic research
Environmental Engineering
Überlandstrasse 133
CH-8600 Dübendorf
phone: +41-44-823-5041
fax: +41-44-823-5389
cell: +41-79-218-9242
mailto:christoph.ort at eawag.ch
http://www.eawag.ch/~ortchris/
http://www.eawag.ch
More information about the R-help
mailing list