[R] Simulate dichotomous correlation matrix
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Jun 28 14:21:09 CEST 2006
"Bliese, Paul D LTC USAMH" <paul.bliese at us.army.mil> writes:
> Newsgroup members,
>
> Does anyone have a clever way to simulate a correlation matrix such that
> each column contains dichotomous variables (0,1) and where each column
> has different prevalence rates.
>
> For instance, I would like to simulate the following correlation matrix:
>
> > CORMAT[1:4,1:4]
> PUREPT PTCUT2 PHQCUT2T ALCCUTT2
> PUREPT 1.0000000 0.5141552 0.1913139 0.1917923
> PTCUT2 0.5141552 1.0000000 0.2913552 0.2204097
> PHQCUT2T 0.1913139 0.2913552 1.0000000 0.1803987
> ALCCUTT2 0.1917923 0.2204097 0.1803987 1.0000000
>
> Where the prevalence for each variable is:
>
> > prevvals=c(0.26,0.10,0.09,0.10)
>
> I can use the mvrnorm function in MASS to create a matrix containing
> random normal variables and dichotomize these variables into 0,1;
> however, this is a less than ideal solution as my observed correlation
> matrix is downwardly biased and the amount of the bias is related to the
> prevalence of each variable.
This is related to the concept of polychoric correlations: These are
correlations that could be passed to mvrnorm and dichotomized by
thresholds to give data with an observed distribution. The question is
if there is a nice way to go from raw correlations and prevalences to
polychoric corr. and thresholds. The threshold bit is easy, just take
qnorm(), but the other bit might not. You could try looking into the
polycor package and see which pieces of information are used there.
Alternatively, you could notice that what you really have is the set
of all 2x2 marginals of a 2x2x2x2 table (you can reconstruct sum(X),
sum(Y) and sum(XY) from the information given) and you could fit a
(log-linear) model for all 16 probabilities using the IPS algorithm.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list