[R] density

Bert Gunter gunter.berton at gene.com
Fri Jul 27 06:43:56 CEST 2012


Inline.

-- Bert

On Thu, Jul 26, 2012 at 8:12 PM, li li <hannah.hlx at gmail.com> wrote:
> Thank you for the reply. I do have another question.
>
> I also want to estimate the derivatives of a density function using the
> derivatives of kernel density estimator.
>
> It is easy to write out the estimator, for example, for Gaussian kernel.
> The difficulty is
> finding the appropriate bandwidth.

Ah -- "There be whales here." (google it!)

There is a vast and mind-numbing statistical literature on finding the
"appropriate bandwidth" for kernel density estimators. Which means, of
course, that there is no overall good way to do it. It depends on the
subject matter context and nature of the data. There is also a vast
literature on other sorts of density estimators, as kde's are rather
old-fashioned these days, I believe. And tons of algorithms, many in
R, no doubt.

The situation for derivatives is even worse, of course -- if the value
the density is uncertain, that of the derivative is even more so.
Unless you have a "lot" of data (don't ask),a real need to do this,
and some understanding of the underlying math involved (you're really
in function spaces, I think), you might do well to consider an
alternative approach to whatever it is you're trying to do.

Further posts should probably be on a statistical list, as this
fundamentally has little to do with R.

Cheers,
Bert


Is there a function in R that  gives the
> bandwidth for derivative kernel estimator for a set of observations?
>
>  I looked at the the function "drvkde". However, it does not seem to return
> bandwidth value.

> Thank you.
>
>
>
> 2012/7/26 David L Carlson <dcarlson at tamu.edu>
>
>> If you want a recommendation, why not use the one that comes with the
>> manual
>> page for density():
>>
>> ?density
>>
>> Under bw
>>
>> "The default, "nrd0", has remained the default for historical and
>> compatibility reasons, rather than as a general recommendation, where e.g.,
>> "SJ" would rather fit, see also V&R (2002)."
>>
>> Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S.
>> New York: Springer.
>>
>> ----------------------------------------------
>> David L Carlson
>> Associate Professor of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> > project.org] On Behalf Of Michael Young
>> > Sent: Wednesday, July 25, 2012 9:53 PM
>> > To: li li
>> > Cc: r-help
>> > Subject: Re: [R] density
>> >
>> > I can't help you decide which bandwidth method to use, but here's how
>> > you view the density source code...
>> >
>> > methods("density")
>> > density.default
>> >
>> > On Wed, Jul 25, 2012 at 5:56 PM, li li <hannah.hlx at gmail.com> wrote:
>> > >
>> > > Hi all,
>> > >   I have a question regarding the density function which gives the
>> > > kernel density estimator.
>> > >   I want to decide the bandwidth when using gaussian kernel, given a
>> > set
>> > > of
>> > > observations. I am not familiar with different methods for  bandwidth
>> > > determination.  Below are the different ways in R on deciding the
>> > > bandwidth.
>> > > Can anyone give an idea on which ones are preferred.
>> > >   Also, how can I take a look at the source code for the density
>> > function?
>> > >   Thank you very much.
>> > >         Hannah
>> > >
>> > >
>> > > x <- rnorm(1000)
>> > >
>> > > > bw.nrd(x)
>> > >
>> > > [1] 0.2688588
>> > >
>> > > > bw.nrd0(x)
>> > >
>> > > [1] 0.2282763
>> > >
>> > > > bw.ucv(x)
>> > >
>> > > [1] 0.2112366
>> > >
>> > > > bw.bcv(x)
>> > >
>> > > [1] 0.2890085
>> > >
>> > > Warning message:
>> > >
>> > > In bw.bcv(x) : minimum occurred at one end of the range
>> > >
>> > > > bw.SJ(x)
>> > >
>> > > [1] 0.2716242
>> > >
>> > > > density(x, give.Rkern=T, kernel="gaussian")
>> > >
>> > > [1] 0.2820948
>> > >
>> > > > density(x, kernel="gaussian")
>> > >
>> > >
>> > > Call:
>> > >
>> > > density.default(x = x, kernel = "gaussian")
>> > >
>> > >
>> > > Data: x (1000 obs.); Bandwidth 'bw' = 0.2283
>> > >
>> > >
>> > >        x                   y
>> > >
>> > >  Min.   :-3.974672   Min.   :0.0000199
>> > >
>> > >  1st Qu.:-1.987712   1st Qu.:0.0076405
>> > >
>> > >  Median :-0.000752   Median :0.0529498
>> > >
>> > >  Mean   :-0.000752   Mean   :0.1256971
>> > >
>> > >  3rd Qu.: 1.986208   3rd Qu.:0.2552411
>> > >
>> > >  Max.   : 3.973168   Max.   :0.3883532
>> > >
>> > > >
>> > >
>> > >         [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>> > guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list