[R] Best practices for handling very small numbers?
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Oct 19 18:24:15 CEST 2011
On 19/10/2011 12:02 PM, Seref Arikan wrote:
> Thanks Duncan,
> Indeed I have found a problem in my likelihood, and I'll work on the log
> approach again. Regarding your comment; do you think it would make sense to
> tackle a data set of few millions for parameter estimation using MCMC? In my
> case we are talking about Griddy Gibbs sampling.
>
> One relevant idea is to break the observations into smaller chunks and use
> sequential updating by taking the posterior of one step as the prior of the
> other, however, I don't feel confident that this would work, since it is not
> the same situation with conjugate priors, where you have analytic form of
> the posterior.
I think you would do better to estimate the posterior directly. It may
be slow, but that just means that you need more computing power.
Duncan Murdoch
> Any pointers (papers, books) towards large scale data processing with MCMC,
> and Gibbs sampling in particular would be much appreciated.
>
> Kind regards
> Seref
>
>
> On Tue, Oct 18, 2011 at 11:12 AM, Duncan Murdoch
> <murdoch.duncan at gmail.com>wrote:
>
> > On 11-10-18 4:30 AM, Seref Arikan wrote:
> >
> >> Hi Dan,
> >> I've tried the log likelihood, but it reaches zero again, if I work with
> >> say
> >> 1000 samples.
> >> I need an approach that would scale to quite large sample sizes. Surely I
> >> can't be the first one to encounter this problem, and I'm sure I'm missing
> >> an option that is embarrassingly obvious.
> >>
> >
> > I think you are calculating the log likelihood incorrectly. Don't
> > calculate the likelihood and take the log; work out the formula for the log
> > of the likelihood, and calculate that. (If the likelihood contains a sum of
> > terms, as in a mixture model, this takes some thinking, but it is still
> > worthwhile.)
> >
> > With most models, it is just about impossible to cause the log likelihood
> > to underflow if it is calculated carefully.
> >
> > Duncan Murdoch
> >
> >
> >> Regards
> >> Seref
> >>
> >> On Mon, Oct 17, 2011 at 6:15 PM, Nordlund, Dan (DSHS/RDA)<
> >> NordlDJ at dshs.wa.gov> wrote:
> >>
> >> -----Original Message-----
> >>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >>>> project.org] On Behalf Of Seref Arikan
> >>>> Sent: Monday, October 17, 2011 9:11 AM
> >>>> To: r-help at r-project.org
> >>>> Subject: [R] Best practices for handling very small numbers?
> >>>>
> >>>> Greetings
> >>>> I have been experimenting with sampling from posterior distributions
> >>>> using
> >>>> R. Assume that I have the following observations from a normal
> >>>> distribution,
> >>>> with an unscaled joint likelihood function:
> >>>>
> >>>> normsamples = rnorm(1000,8,3)
> >>>>
> >>>> joint_likelihood = function(observations, mean, sigma){
> >>>> return((sigma ^ (-1 * length(observations))) * exp(-0.5 * sum(
> >>>> ((observations - mean ) ^ 2)) / (sigma ^ 2) ));
> >>>> }
> >>>>
> >>>> the joint likelihood omits the constant (1/(2Pi)^n), which is what I
> >>>> want,
> >>>> because I've been experimenting with some crude sampling methods. The
> >>>> problem is, with the samples above, the joint likelihood becomes 0 very
> >>>> quickly.
> >>>> I wanted to experiment with tens of thousands of observations, but
> >>>> without
> >>>> an implementation of a transformation that can handle very small
> >>>> values, my
> >>>> sampling algorithms would not work.
> >>>>
> >>>> This is an attempt to use some sampling algorithms for Bayesian
> >>>> parameter
> >>>> estimation. I do not want to resort to conjugacy, since I am developing
> >>>> this
> >>>> to handle non conjugate scenarios, I just wanted to test it on a
> >>>> conjugate
> >>>> scenario, but I've quickly realized that I'm in trouble due to
> >>>> likelihood
> >>>> reaching 0 quickly.
> >>>>
> >>>> Your feedback would be appreciated. It makes me wonder how JAGS/BUGS
> >>>> handles
> >>>> this problem
> >>>>
> >>>> Best regards
> >>>> Seref
> >>>>
> >>>>
> >>> Maybe you should work with the log-likelihood?
> >>>
> >>>
> >>> Hope this is helpful,
> >>>
> >>> Dan
> >>>
> >>> Daniel J. Nordlund
> >>> Washington State Department of Social and Health Services
> >>> Planning, Performance, and Accountability
> >>> Research and Data Analysis Division
> >>> Olympia, WA 98504-5204
> >>>
> >>> ______________________________**________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html>
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________**________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide http://www.R-project.org/**
> >> posting-guide.html<http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>
More information about the R-help
mailing list