[R] Bivariate kernel density bandwidth selection

Thu Dec 9 22:26:59 CET 2010

I'm faaaaaar from knowledgeable on R or kernel density estimation
(or many other statistical things for that matter),
but allow me a simplistic suggestion:

I remember that in a package I co-authored quite some time ago
(chplot -- maybe you might have a look at what it does)
we use bkde2D from the KernSmooth package,
and it works fast (even for sample size 10000).

Of course, the bkde2D function and the KernSmooth package
have much less capabilities and options than the functions
from the ks package, which you say are a good choice for your data;
but maybe ...

Sorry if you've already considered this and my suggestion is silly.

Regards,
Assist. Prof. Gaj Vidmar, PhD
Univ. of Ljubljana, Fac. of Medicine, Inst. for Biostatistics and Medical 
Informatics

"Glen Sargeant" <gsargeant at usgs.gov> wrote in message 
news:1291920190435-3080753.post at n4.nabble.com...
>
> I've been trying to implement bivariate kernel density estimation.  For 
> data
> like mine, function "kde" from package "ks" with bandwidth matrix derived 
> by
> function "Hscv" seems like a very good choice.  Unfortunately, Hscv seems
> unmanageably slow except for very small sample sizes (up to a few hundred)
> and my sample sizes are quite large (up to a few thousand).  I've reviewed
> help files, vignettes, previous postings on this list, and the JSS paper
> describing ks and haven't found much mention of constraints on sample size
> other than using kfold cross-validation to speed 
> calculation:unfortunately,
> that option is listed but not enabled for Hscv.
>
> An example illustrates my problem.  Each of the following expressions
> returns the time elapsed to estimate a bandwidth matrix.  The first is for 
> a
> sample of 100 x and y coordinates, the second is for a sample of 200 x and 
> y
> coordinates.
>
>>     system.time(Hscv(x=xy.100))
>   user  system elapsed
>   1.97    0.03    2.00
>
>>     system.time(Hscv(x=xy.200))
>   user  system elapsed
>   6.03    0.17    6.22
>
> I have to do this many, many times and each run will involve up to several
> thousand records, so you can see my problem.
>
> I should think that others must surely have encountered and overcome this
> challenge.  If anyone can kindly point me in a productive direction, I 
> will
> be most grateful.
>
>
> -----
> Glen Sargeant
> Research Wildlife Biologist
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Bivariate-kernel-density-bandwidth-selection-tp3080753p3080753.html
> Sent from the R help mailing list archive at Nabble.com.
>