[R] random truncation
Spencer Graves
@pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Sat Jul 13 14:47:29 CEST 2019
On 2019-07-12 22:31, Abby Spurdle wrote:
> > The distribution of the randomly truncated variable has thus four
> > parameters: a, b, mu and sigma. I was able to write down the likelihood
> > and attempted to maximise it
>
> I read the Wikipedia article more carefully.
> The formula is relatively simple, and is based on the application of
> Bayes Theorem.
> If one doesn't want to work out the integral, numerical methods can be
> used.
>
> However, the problem needs to be defined *precisely* first.
>
Correct: In my case, I confess I hadn't thought this through completely
before posting. I tried Rseek, as Bert Gunter suggested. That led me
to the "truncreg" and "DTDA" packages, neither of which seemed to be
what I wanted; thanks to Bert, Rolf, and Abby for your comments.
I'm observing Y[i] = (X[i]'b+e) given Y[i]>(z[i]'c+f) where e and
f are normally distributed with standard deviations s and t,
respectively, i = 1:n. I want the total of all the Y's, including those
truncated (and not observed).
I think the likelihood is the product of the density of Y[i]
given x[i] and given that Y[i] is actually observed. The latter can be
further written as follows:
(x[i]'b+e)>(z[i]c+f)
iff
(x[i]'b-z[i]'c)>(f-e),
Therefore, the probability that Y[i] is observed is
Pr{Y[i] observed} = Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2))
where Phi is the cdf of the standard normal.
And then the likelihood for observation i can be written as follows:
f(y[i]|x[i], z[i], b, c, s, t) = phi((y[i]-x[i]'b)/s) /
Phi((x[i]'b-z[i]'c)/sqrt(s^2+t^2).
We may not be able to estimate "c" in this, because if one of the
z[i]'s is nonzero, we can pick "c" so z[i]'c is Inf. That makes the
denominator 0 and the likelihood Inf. (If all the z[i]'s are zero, we
still cannot estimate "c".) However, if "b" is estimable, ignoring the
truncation, then we can estimate "b", "s" and "t" given "c".
And then the desired total of all the Y's, observed and
unobserved, would be the sum of y[i] divided by Pr{Y[i] observed}.
This likelihood is simple enough, it can be easily programmed in
R and maximized over variations in "b", "s" and "t" given "c". I can
get starting values for "b" and "s" from "lm", ignoring the truncation.
And I can first fit the model assuming t = s, then test whether it's
different using likelihood ratio. And I can try to estimate "c", but I
should probably use values I can estimate from other sources until I'm
comfortable with the estimates I get for "b", "s" and "t" given an
assumed value for "c".
Comments?
Thanks so much.
Spencer Graves
More information about the R-help
mailing list