[R] Tetrachoric correlation in R vs. stata
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Jun 24 00:30:03 CEST 2006
Janet Rosenbaum <jrosenba at rand.org> writes:
> Peter --- Thanks for pointing out the omitted information. The
> hazards of attempting to be brief.
>
> In R, I am using polychor(vec1, vec2, std.err=T) and have used both
> the ML and 2 step estimates, which give virtually identical answers.
> I am explicitly using only the 632 complete cases in R to make sure
> missing data is handled the same way as in stata.
>
> Here's my data:
>
> 522 54
> 34 22
>
> > polychor(v1, v2, std.err=T, ML=T)
>
> Polychoric Correlation, ML est. = 0.5172 (0.08048)
> Test of bivariate normality: Chisquare = 8.063e-06, df = 0, p = NaN
>
> Row Thresholds
> Threshold Std.Err.
> 1 1.349 0.07042
>
>
> Column Thresholds
> Threshold Std.Err.
> 1 1.174 0.06458
> Warning message:
> NaNs produced in: pchisq(q, df, lower.tail, log.p)
>
> In stata, I get:
>
> . tetrachoric t1_v19a ct1_ix17
>
> Tetrachoric correlations (N=632)
>
> ----------------------------------
> Variable | t1_v19a ct1_ix17
> -------------+--------------------
> t1_v19a | 1
> ct1_ix17 | .6169 1
> ----------------------------------
Well,
> pmvnorm(c(1.349,1.174),c(Inf,Inf),
+ sigma=matrix(c(1,.5172,.5172,1),2))*632
[1] 22.00511
attr(,"error")
[1] 1e-15
attr(,"msg")
[1] "Normal Completion"
> pnorm(1.349)*632
[1] 575.9615
> pnorm(1.174)*632
[1] 556.0352
so the estimates from R appear to be consistent with the table. In
contrast, plugging in the .6169 from Stata gives
> pmvnorm(c(1.349,1.174),c(Inf,Inf),
+ sigma=matrix(c(1,.6169,.6169,1),2))*632
[1] 26.34487
...
You might want to follow up on
http://www.ats.ucla.edu/stat/stata/faq/tetrac.htm
> Thanks for your help.
>
> Janet
>
>
>
> Peter Dalgaard wrote:
> > Janet Rosenbaum <jrosenba at rand.org> writes:
> >
> >> I hope someone here knows the answer to this since it will save me
> >> from delving deep into documentation.
> >>
> >> Based on 22 pairs of vectors, I have noticed that tetrachoric
> >> correlation coefficients in stata are almost uniformly higher than
> >> those in R, sometimes dramatically so (TCC=.61 in stata, .51 in R;
> >> .51 in stata, .39 in R). Stata's estimate is higher than R's in 20
> >> out of 22 computations, although the estimates always fall within
> >> the 95% CI for the TCC calculated by R.
> >>
> >> Do stata and R calculate TCC in dramatically different ways? Is
> >> the handling of missing data perhaps different? Any thoughts?
> >>
> >> Btw, I am sending this question only to the R-help list.
> > A bit more information seems necessary:
> > - tetrachoric correlations depend on 4 numbers, so you should be able
> > to give a direct example
> > - you're not telling us how you calculate the TCC in R. This is not
> > obvious (package polycor?).
> >
>
>
> --------------------
>
> This email message is for the sole use of the intended rec...{{dropped}}
More information about the R-help
mailing list