[R] Tetrachoric correlation in R vs. stata
John Fox
jfox at mcmaster.ca
Sun Jun 25 08:26:03 CEST 2006
Dear Janet,
A good thing to do when different software gives different answers is
to check each against known results. I'm away from home, and don't have
all of the examples that I used to check polychor(), but I dug up the
following. The polychor() function produces output that agrees with
both of these sources. How does Stata do?
> # example from Drasgow (1988), pp. 69-74 in Kotz and Johnson,
> # Encyclopedia of statistical sciences. Vol. 7.
> tab
[,1] [,2] [,3]
[1,] 58 52 1
[2,] 26 58 3
[3,] 8 12 9
> polychor(tab, std.err=TRUE)
Polychoric Correlation, 2-step est. = 0.42 (0.07474)
Test of bivariate normality: Chisquare = 11.55, df = 3, p = 0.009078
> polychor(tab, ML=TRUE, std.err=TRUE)
Polychoric Correlation, ML est. = 0.4191 (0.07616)
Test of bivariate normality: Chisquare = 11.54, df = 3, p = 0.009157
Row Thresholds
Threshold Std.Err.
1 -0.02988 0.08299
2 1.13300 0.10630
Column Thresholds
Threshold Std.Err.
1 -0.2422 0.08361
2 1.5940 0.13720
> tab # example from Brown (1977) Applied Statistics, 26:343-351.
[,1] [,2]
[1,] 1562 42
[2,] 383 94
> polychor(tab)
[1] 0.595824
>
Regards,
John
On Fri, 23 Jun 2006 14:33:31 -0700
Janet Rosenbaum <jrosenba at rand.org> wrote:
> Peter --- Thanks for pointing out the omitted information. The
> hazards
> of attempting to be brief.
>
> In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> the
> ML and 2 step estimates, which give virtually identical answers. I
> am
> explicitly using only the 632 complete cases in R to make sure
> missing
> data is handled the same way as in stata.
>
> Here's my data:
>
> 522 54
> 34 22
>
> > polychor(v1, v2, std.err=T, ML=T)
>
> Polychoric Correlation, ML est. = 0.5172 (0.08048)
> Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
>
> Row Thresholds
> Threshold Std.Err.
> 1 1.349 0.07042
>
>
> Column Thresholds
> Threshold Std.Err.
> 1 1.174 0.06458
> Warning message:
> NaNs produced in: pchisq(q, df, lower.tail, log.p)
>
> In stata, I get:
>
> . tetrachoric t1_v19a ct1_ix17
>
> Tetrachoric correlations (N=632)
>
> ----------------------------------
> Variable | t1_v19a ct1_ix17
> -------------+--------------------
> t1_v19a | 1
> ct1_ix17 | .6169 1
> ----------------------------------
>
> Thanks for your help.
>
> Janet
>
>
>
> Peter Dalgaard wrote:
> > Janet Rosenbaum <jrosenba at rand.org> writes:
> >
> >> I hope someone here knows the answer to this since it will save me
> from
> >> delving deep into documentation.
> >>
> >> Based on 22 pairs of vectors, I have noticed that tetrachoric
> >> correlation coefficients in stata are almost uniformly higher than
> those
> >> in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51
> in
> >> stata, .39 in R). Stata's estimate is higher than R's in 20 out
> of 22
> >> computations, although the estimates always fall within the 95% CI
> for
> >> the TCC calculated by R.
> >>
> >> Do stata and R calculate TCC in dramatically different ways? Is
> the
> >> handling of missing data perhaps different? Any thoughts?
> >>
> >> Btw, I am sending this question only to the R-help list.
> >
> >
> > A bit more information seems necessary:
> >
> > - tetrachoric correlations depend on 4 numbers, so you should be
> able
> > to give a direct example
> >
> > - you're not telling us how you calculate the TCC in R. This is not
> > obvious (package polycor?).
> >
>
>
> --------------------
>
> This email message is for the sole use of the intended\ > ...{{dropped}}
More information about the R-help
mailing list