[R] Statistical distribution not fitting

Wed Jul 22 21:56:58 CEST 2015

Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes:

> Hello!  (I dont know if I can raise this query here on this forum,
> but I had already raised on teh finance forum, but have not received
> any sugegstion, so now raising on this list. Sorry for the same. The
> query is about what to do, if no statistical distribution is fitting
> to data).

> I am into risk management and deal with Operatioanl risk. As a part
> of BASEL II guidelines, we need to arrive at the capital charge the
> banks must set aside to counter any operational risk, if it
> happens. As a part of Loss Distribution Approach (LDA), we need to
> collate past loss events and use these loss amounts. The usual
> process as being practised in the industry is as follows -

> Using these historical loss amounts and using the various
> statistical tests like KS test, AD test, PP plot, QQ plot etc, we
> try to identify best statistical (continuous) distribution fitting
> this historical loss data. Then using these estimated parameters
> w.r.t. the statistical distribution, we simulate say 1 miliion loss
> anounts and then taking appropriate percentile (say 99.9%), we
> arrive at the capital charge.

> However, many a times, loss data is such that fitting of
> distribution to loss data is not possible. May be loss data is
> multimodal or has significant variability, making the fitting of
> distribution impossible.  Can someone guide me how to deal with such
> data and what can be done to simulate losses using this historical
> loss data in R.

A skew-(log)-normal fit doesn't look too bad ... (whenever you
have positive data that are this strongly skewed, log-transforming
is a good step)

hist(log10(mydat),col="gray",breaks="FD",freq=FALSE)
## default breaks are much coarser:
## hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE)
lines(density(log10(mydat)),col=2,lwd=2)
library(fGarch)
ss <- snormFit(log10(mydat))
xvec <- seq(2,6.5,length=101)
lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
      col="blue",lwd=2)
## or try a skew-Student-t: not very different:
ss2 <- sstdFit(log10(mydat))
lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
      col="purple",lwd=2)

There are more flexible distributional families (Johnson,
log-spline ...)

Multimodal data are a different can of worms -- consider
fitting a finite mixture model ...