[R] Unable to reproduce Stata Heckman sample selection estimates
Yuan Yuan
y.yuan at vt.edu
Fri Nov 25 22:37:53 CET 2011
Hi Arne,
I believe I figured out why the Stata coefficient estimates differed
from R's: in my case, the outcome response variable is binary, so
the outcome equation is a probit model. From my reading of the
sampleSelection paper, it seems that the Tobit-2 model has a
continuous outcome response variable. The Stata command used was
heckprob, which assumes both the outcome and the selection equations
are probit models. When I compared the Stata heckman command with
the R results, I found the estimates were the same.
Sorry for not picking up on that difference earlier.
So it seems that selection() is perhaps not what I'm looking for,
unless there is a way to specify a probit selection model. Is there
a package out there that estimates probit models with Heckman sample
selection? It looks like SemiParBIVProbit might work for me.
- Clara
On Friday, November 25, 2011 11:05:31 am Yuan Yuan wrote:
> Hi Arne,
>
> Thanks for the reply.
>
> I am using R version 2.14.0 and sampleSelection version 0.6.12.
>
> I estimate the model by the 1-step ML method. However, when I use
> the 2-step method, the standard errors are reported as NA.
>
> I use the selection() function, very basic call, something to the
> effect of: selection(selectionFormula, outcomeFormula, data =
> aDataFrame), where the formulas are very straightforward and basic
> as well, y ~ x1 + x2 + ... + xp.
>
> I have read the associated paper, which is where I got the idea to
> pass the coefficients from a seleciton object to the start
argument.
>
> I will work on creating a minimal reproducible example; the
dataset
> is large and confidential, the models long-ish.
>
> - Clara
>
> On Friday, November 25, 2011 04:04:52 am Arne Henningsen wrote:
> > On 25 November 2011 04:37, Yuan Yuan <y.yuan at vt.edu> wrote:
> > > Hello,
> > >
> > > I am working on reproducing someone's analysis which was done
in
> > > Stata. The analysis is estimation of a standard Heckman sample
> > > selection model (Tobit-2), for which I am using the
>
> sampleSelection
>
> > > package and the selection() function. I have a few problems
with
>
> the
>
> > > estimation:
> > >
> > > 1) The reported standard error for all estimates is Inf ...
> > > vcov(selectionObject) yields Inf in every cell.
> > >
> > > 2) While the selection equation coefficient estimates are
almost
> > > exactly the same as the Stata results, the outcome equation
> > > coefficient estimates are quite different (different sign in
one
>
> case,
>
> > > order of magnitude difference in some other cases).
> > >
> > > 3) I can't seem to figure out how to specify the initial
values
>
> for
>
> > > the MLE ... whatever argument I pass to start (even of the
form
> > > coef(selectionObject)), I get the following error:
> > > Error in gr[, fixed] <- NA : (subscript) logical subscript too
>
> long
>
> > > I have to admit I am pretty confused by #1, I feel like I must
>
> be
>
> > > doing something wrong, missing something obvious, but I have
no
>
> idea
>
> > > what. I figure #2 might be because the algorithms (selection
and
> > > Stata) are just finding different local maxima, but because of
>
> #3 I
>
> > > can't test that guess by using different initial values in
>
> selection.
>
> > > Let me know if I should provide any more information. Thanks
in
> > > advance for any pointers in the right direction.
> >
> > Yes, please provide more information (see also the posting guide
>
> [1]),
>
> > e.g. which version of R and which version of the sampleSelection
> > package are you using? Do you estimate the model by the two-step
> > approach or by the 1-step maximum likelihood method? Which
>
> commands
>
> > did use use? Can you send us a reproducible example? Have you
read
>
> the
>
> > paper about using the sampleSelection package [2]?
> >
> > [1] http://www.r-project.org/posting-guide.html
> > [2] http://www.jstatsoft.org/v27/i07
> >
> > Best wishes from copenhagen,
> > Arne
More information about the R-help
mailing list