[R] r code for multilevel latent class analysis

Cristina Cametti cristina.cametti at gmail.com
Fri Jul 8 22:57:11 CEST 2016


Dear all,

thank you very much for your suggestions! About the fact that I put my variables plus 1, it is because if I don’t do it, I get this message:

ALERT: some manifest variables contain values that are not
    positive integers. For poLCA to run, please recode categorical
    outcome variables to increment from 1 to the maximum number of
    outcome categories for each variable. 

I read about the fact that poLCA need integers to run (in the various explanations on the web). So, in the end I used this code:


lca = poLCA(cbind(ppltrst=ppltrst+1,pplfair=pplfair+1,pplhlp=pplhlp+1) ~ cntry, 
             maxiter=50000, nclass=3, 
             nrep=10, data=mydata)

It is working, I mean this is the output that I obtained:
Model 1: llik = -244070.8 ... best llik = -244070.8
Model 2: llik = -241832.9 ... best llik = -241832.9
Model 3: llik = -245111.9 ... best llik = -241832.9
Model 4: llik = -242490.5 ... best llik = -241832.9
Model 5: llik = -240447.7 ... best llik = -240447.7
Model 6: llik = -250882.1 ... best llik = -240447.7
Model 7: llik = -240447.7 ... best llik = -240447.7
Model 8: llik = -242547.3 ... best llik = -240447.7
Model 9: llik = -240447.7 ... best llik = -240447.7
Model 10: llik = -247340.3 ... best llik = -240447.7
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$ppltrst
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)  Pr(6)  Pr(7)  Pr(8)  Pr(9) Pr(10) Pr(11)
class 1:  0.0071 0.0047 0.0057 0.0112 0.0163 0.1146 0.0892 0.2794 0.3215 0.0897 0.0605
class 2:  0.2375 0.1703 0.2117 0.1681 0.0564 0.1000 0.0098 0.0097 0.0150 0.0078 0.0137
class 3:  0.0189 0.0104 0.0494 0.1442 0.1687 0.3268 0.1499 0.1094 0.0216 0.0008 0.0000

$pplfair
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)  Pr(6)  Pr(7)  Pr(8)  Pr(9) Pr(10) Pr(11)
class 1:  0.0025 0.0021 0.0054 0.0062 0.0062 0.0512 0.0670 0.2721 0.3697 0.1394 0.0782
class 2:  0.1712 0.1285 0.2048 0.1654 0.0667 0.1586 0.0151 0.0133 0.0261 0.0143 0.0361
class 3:  0.0003 0.0011 0.0186 0.0952 0.1428 0.3413 0.1756 0.1584 0.0577 0.0068 0.0023

$pplhlp
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)  Pr(6)  Pr(7)  Pr(8)  Pr(9) Pr(10) Pr(11)
class 1:  0.0046 0.0051 0.0139 0.0369 0.0495 0.1804 0.1434 0.2374 0.2127 0.0720 0.0442
class 2:  0.2218 0.1893 0.2334 0.1412 0.0471 0.1044 0.0081 0.0098 0.0139 0.0107 0.0205
class 3:  0.0074 0.0159 0.0755 0.1779 0.1731 0.2870 0.1226 0.0984 0.0380 0.0024 0.0018

Estimated class population shares 
 0.3351 0.2014 0.4635 
 
Predicted class memberships (by modal posterior prob.) 
 0.3323 0.187 0.4807 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
2 / 1 
            Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -0.67015     0.07488   -8.950     0.000
cntryBE         0.39006     0.11038    3.534     0.000
cntryCH        -1.39925     0.14273   -9.804     0.000
cntryCZ         1.26804     0.12295   10.314     0.000
cntryDE         0.14062     0.10330    1.361     0.174
cntryDK        -2.90300     0.22895  -12.680     0.000
cntryES         0.67484     0.11334    5.954     0.000
cntryFI        -2.47880     0.18096  -13.698     0.000
cntryFR         0.73428     0.12339    5.951     0.000
cntryGB        -0.28234     0.11663   -2.421     0.016
cntryGR         2.69529     0.11605   23.225     0.000
cntryHU         1.71068     0.12213   14.006     0.000
cntryIE        -0.65851     0.10951   -6.013     0.000
cntryIT         1.49171     0.13915   10.720     0.000
cntryLU         0.35923     0.11635    3.088     0.002
cntryNL        -1.17781     0.12193   -9.659     0.000
cntryNO        -2.83265     0.19959  -14.192     0.000
cntryPL         2.76831     0.14608   18.950     0.000
cntryPT         1.65176     0.13278   12.440     0.000
cntrySE        -1.93606     0.15250  -12.696     0.000
cntrySI         1.52657     0.11667   13.084     0.000
========================================================= 
3 / 1 
            Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)     0.40117     0.05997    6.689     0.000
cntryBE         0.37591     0.09215    4.079     0.000
cntryCH        -0.27800     0.08100   -3.432     0.001
cntryCZ         0.75694     0.11287    6.707     0.000
cntryDE         0.47418     0.08072    5.874     0.000
cntryDK        -1.92545     0.10423  -18.473     0.000
cntryES         0.55324     0.09713    5.696     0.000
cntryFI        -1.27608     0.08553  -14.920     0.000
cntryFR         0.74221     0.10457    7.098     0.000
cntryGB         0.27381     0.08555    3.201     0.001
cntryGR         1.06585     0.11440    9.317     0.000
cntryHU         1.05480     0.11392    9.259     0.000
cntryIE        -0.61390     0.08243   -7.448     0.000
cntryIT         1.12519     0.12983    8.667     0.000
cntryLU         0.23379     0.09523    2.455     0.014
cntryNL        -0.35230     0.07981   -4.414     0.000
cntryNO        -1.54373     0.08836  -17.471     0.000
cntryPL         1.68826     0.14280   11.823     0.000
cntryPT         1.26210     0.12225   10.324     0.000
cntrySE        -1.09443     0.08383  -13.055     0.000
cntrySI         0.68829     0.11137    6.180     0.000
========================================================= 
number of observations: 39254 
number of estimated parameters: 132 
residual degrees of freedom: 1198 
maximum log-likelihood: -240447.7 
 
AIC(3): 481159.3
BIC(3): 482291.6
X^2(3): 40937.31 (Chi-square goodness of fit) 
 
ALERT: estimation algorithm automatically restarted with new initial values 
 
Warning message:
In sqrt(diag(VCE.beta)) : Si è prodotto un NaN


Except for the last part, I think that this output makes sense. 
However, if you have additional suggestions, I will be very glad to hear them.
Thank you very much again!

Cristina 


Il giorno 08/lug/2016, alle ore 01:49, David Winsemius <dwinsemius at comcast.net> ha scritto:

>> 
>> On Jul 7, 2016, at 3:36 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
>> 
>> Hi Cristina,
>> Try this:
>> 
>> names(mydata)
>> 
>> It may be NULL or "ppitrst" may be absent.
> 
> I've already suggested to Christina that she make sure the variables are spelled correctly and she reports they are all present in her dataset. So I tried a formula such as she posed with '1' added to each variable and this does throw the same error with the 'values'-dataframe that is used in the examples for that package.
> 
>> data(values,package='poLCA')
>> str(values)
> 'data.frame':	216 obs. of  4 variables:
> $ A: num  2 2 2 2 2 2 2 2 2 2 ...
> $ B: num  2 2 2 2 2 2 2 2 2 2 ...
> $ C: num  2 2 2 2 2 2 2 2 2 2 ...
> $ D: num  2 2 2 2 2 2 2 2 2 2 ...
>> library(poLCA)
> Loading required package: scatterplot3d
> Loading required package: MASS
>> poLCA( cbind(A+1,B+1) ~ C, data=values)
> Error in `[.data.frame`(data, , match(colnames(y), colnames(data))[j]) : 
>  undefined columns selected
> 
> So I then tried removeing those "+1`"'s (which didn't seem to have much justification):
> 
>> poLCA( cbind(A,B) ~ C, data=values)
> Conditional item response (column) probabilities,
> by outcome variable, for each class (row) 
> 
> $A
>           Pr(1)  Pr(2)
> class 1:  0.3428 0.6572
> class 2:  0.0307 0.9693
> 
> $B
>           Pr(1)  Pr(2)
> class 1:  0.7737 0.2263
> class 2:  0.1386 0.8614
> 
> snipped the rest of the output.
> 
> So "why add 1?" Seems to disturb the functions formula processing logic and is so far not explained.
> 
> -- 
> David.
> 
>> 
>> Jim
>> 
>> 
>> On Thu, Jul 7, 2016 at 8:26 PM, Cristina Cametti
>> <cristina.cametti at gmail.com> wrote:
>>> Dear all,
>>> 
>>> I am not able to find a reliable r code to run a multilevel latent class model. Indeed, I have to analyze how social trust (three variables form the ESS survey) might vary between countries (21 countries in my database). I tried to use the poLCA package but I am not sure if my code is right. This is my code:
>>> lca <- cbind(ppltrst+1,pplfair+1,pplhlp+1)~cntry
>>> lc <- poLCA(lca,mydata)
>>> 
>>> However, I get an error message:
>>> Error in `[.data.frame`(data, , match(colnames(y), colnames(data))[j]) :
>>> undefined columns selected
>>> 
>>> How can I solve this? Is the code completely wrong or I missed some passages?
>>> Thank you very much for your help!
>>> 
>>> Cristina
>>>       [[alternative HTML version deleted]]
>>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


	[[alternative HTML version deleted]]



More information about the R-help mailing list