[R] Statistical distribution not fitting
Amelia Marsh
amelia_marsh08 at yahoo.com
Thu Jul 23 08:05:56 CEST 2015
Dear Sir,
Thanks for your great guidance. Made me realize that I need to think out of box.
As regards the low losses, BASEL guidelines do say to get rid of such low losses which create noise in analysing the losses caused by Operational Loss events.
Its the right tail events do matter which represent low frequency high magnitude nature losses.
But my client is so adamant about it, that although we have shown them research papers about threshold limits which need to apply to arrive at some meaningful analyses, he is insisting that we do include these low losses too and fit some distribution.
Lastly using the command
rsnorm(10000, mean = m, sd = s, xi = x)
where m, s and x are the estimated parameters obtained from loss data. The usual procedure is to arrange these simulated values in descending order and select an observation representing (say 99.9%) and this is Value at Risk (VaR) which is say 'p'.
My understanding is to this value 'p', I need to apply the transformation 10^p to arrive at the value which is in line with my original loss data. Am I right?
Thanks again sir for your great help. I have something to look ahead now.
Regards
Amelia
_____________________________________________________________________________
On Thursday, 23 July 2015 2:20 AM, Boris Steipe <boris.steipe at utoronto.ca> wrote:
So - as you can see, your data can be modelled.
Now the interesting question is: what do you do with that knowledge. I know nearly nothing about your domain, but given that the data looks log-normal, I am curious abut the following:
- Most of the events are in the small-loss category. But most of the damage is done by the rare large losses. Is it even meaningful to guard against a single 1/1000 event? Shouldn't you be saying: my contingency funds need to be large enough to allow survival of, say, a fiscal year with 99.9 % probability? This is a very different question.
- If a loss occurs, in what time do the funds need to be replenished? Do you need to take series of events into account?
- The model assumes that the data are independent. This is probably a poor (and dangerous) assumption.
Cheers,
B.
On Jul 22, 2015, at 3:56 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes:
>
>
>> Hello! (I dont know if I can raise this query here on this forum,
>> but I had already raised on teh finance forum, but have not received
>> any sugegstion, so now raising on this list. Sorry for the same. The
>> query is about what to do, if no statistical distribution is fitting
>> to data).
>
>> I am into risk management and deal with Operatioanl risk. As a part
>> of BASEL II guidelines, we need to arrive at the capital charge the
>> banks must set aside to counter any operational risk, if it
>> happens. As a part of Loss Distribution Approach (LDA), we need to
>> collate past loss events and use these loss amounts. The usual
>> process as being practised in the industry is as follows -
>
>> Using these historical loss amounts and using the various
>> statistical tests like KS test, AD test, PP plot, QQ plot etc, we
>> try to identify best statistical (continuous) distribution fitting
>> this historical loss data. Then using these estimated parameters
>> w.r.t. the statistical distribution, we simulate say 1 miliion loss
>> anounts and then taking appropriate percentile (say 99.9%), we
>> arrive at the capital charge.
>
>> However, many a times, loss data is such that fitting of
>> distribution to loss data is not possible. May be loss data is
>> multimodal or has significant variability, making the fitting of
>> distribution impossible. Can someone guide me how to deal with such
>> data and what can be done to simulate losses using this historical
>> loss data in R.
>
> A skew-(log)-normal fit doesn't look too bad ... (whenever you
> have positive data that are this strongly skewed, log-transforming
> is a good step)
>
> hist(log10(mydat),col="gray",breaks="FD",freq=FALSE)
> ## default breaks are much coarser:
> ## hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE)
> lines(density(log10(mydat)),col=2,lwd=2)
> library(fGarch)
> ss <- snormFit(log10(mydat))
> xvec <- seq(2,6.5,length=101)
> lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
> col="blue",lwd=2)
> ## or try a skew-Student-t: not very different:
> ss2 <- sstdFit(log10(mydat))
> lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
> col="purple",lwd=2)
>
> There are more flexible distributional families (Johnson,
> log-spline ...)
>
> Multimodal data are a different can of worms -- consider
> fitting a finite mixture model ...
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list