[R] Fitting Structural Equation Model with sem package and Summary issues
John Fox
jfox at mcmaster.ca
Fri Dec 19 17:27:40 CET 2014
Dear Guillaume,
Please see comments interspersed below:
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
> Guillaume Souchay
> Sent: Thursday, December 18, 2014 8:50 AM
> To: r-help at r-project.org
> Subject: [R] Fitting Structural Equation Model with sem package and
> Summary issues
>
> Hi all,
>
> I am trying to analyse bird data to investigate carry-over effect
> using structural equation model.
> I failed to run properly a big model with several latent variables
> with both L -> M block and M -> L block.
> Rather than trying again and again with the huge model, I am now
> looking to a subset of the model.
>
> Due to previous error message (singularity in the matrix), I scaled
> all the variables.
> Here is a subset of the data:
> > dataE[1:15,]
> Fledgling_date_t Total_Output_t Breed_nb.clutch_t.1 Breed_Egg_t.1
> Breed_Total_Output_t.1
> 1 1.09397971 1.19657515 0.4696909 -0.69784742
> 1.2558119
> 2 0.62564592 0.37786584 0.4696909 0.02046473
> -0.1762543
> 3 1.51548013 1.19657515 -1.0568046 0.89840181
> -1.8947338
> 4 0.15731212 1.60592981 -1.0568046 -1.49597204
> -1.3219073
> 5 0.48514578 -0.44084348 0.4696909 -0.69784742
> 0.6829854
> 6 1.93698054 -0.03148882 0.4696909 0.02046473
> 0.9693987
> 7 -1.66918968 -0.85019813 -1.0568046 0.97821428
> -0.4626676
> 8 0.01681198 0.78722049 0.4696909 -0.53822250
> 1.2558119
> 9 0.34464564 1.60592981 -1.0568046 -0.13916019
> -1.8947338
> 10 1.23447985 1.19657515 -1.0568046 1.93596382
> -0.7490808
> 11 -0.12368816 -0.85019813 0.4696909 1.93596382
> 0.1101589
> 12 -0.17052154 0.78722049 -1.0568046 -0.45841004
> -0.1762543
> 13 -1.52868954 -0.44084348 -1.0568046 1.37727658
> -0.4626676
> 14 0.15731212 -1.25955279 -1.0568046 1.61671397
> -0.4626676
> 15 -0.17052154 -0.85019813 -1.0568046 0.97821428
> -1.6083205
>
> Library(sem)
> # the covariance matrix for scaled data
> S.covE <-
> readMoments(diag=T,names=c("Fledgling_date_t","Total_Output_t","Breed_nb
> .clutch_t.1","Breed_Egg_t.1","Breed_Total_Output_t.1"))
> 1.0000000
> 0.350170246 1.0000000
> -0.075832501 -0.099929893 1.0000000
> -0.15439341 -0.091334987 -0.131698418 1.0000000
> -0.191457491 -0.227843749 0.510666663 -0.386711653 1.0000000
>
> # specification of the model - I also provided a diagram of the model
> in the attached PDF.
> modelE <- specifyModel()
> EndBreed -> Fledgling_date_t, lambda1, NA
> EndBreed -> Total_Output_t, lambda1, NA
> Fledgling_date_t <-> Fledgling_date_t, delta1, NA
> Total_Output_t <-> Total_Output_t, delta2, NA
> Fledgling_date_t <-> Total_Output_t, theta1, NA
> EndBreed -> BreedSucc, gamma1, NA
> EndBreed <-> EndBreed, phi1, NA
> BreedSucc -> Breed_Egg_t.1, lamdae, NA
> BreedSucc -> Breed_Total_Output_t.1, lamdae, NA
> BreedSucc -> Breed_nb.clutch_t.1, lamdae, NA
> Breed_nb.clutch_t.1 <-> Breed_nb.clutch_t.1, eps1, NA
> Breed_Egg_t.1 <-> Breed_Egg_t.1, eps2, NA
> Breed_Total_Output_t.1 <-> Breed_Total_Output_t.1, eps3, NA
> Breed_nb.clutch_t.1 <-> Breed_Egg_t.1, psie12, NA
> Breed_Egg_t.1 <-> Breed_Total_Output_t.1, psie23, NA
> Breed_nb.clutch_t.1 <-> Breed_Total_Output_t.1, psie13, NA
> BreedSucc <-> BreedSucc, zetae, NA
>
> # estimation of the model
> semE <- sem(modelE,S.covE,N=39,debug=T)
>
> To this point, everything seemed fine, the parameter were estimated
> after 129 iterations with all data.
> However, the problem arised when I asked for a summary of the model:
>
> > summary(semE)
> Error in summary.objectiveML(semE) :
> coefficient covariances cannot be computed
>
> But the model seemed to work well :
>
> > semE
>
> Model Chisquare = 0.9876903 Df = 1
>
> lambda1 delta1 delta2 theta1 gamma1 phi1
> lamdae eps1 eps2
> 0.8251654 0.3302009 0.3418300 -0.3138143 0.4122545 0.9752364
> -0.4671335 0.8020365 0.7857964
> eps3 psie12 psie23 psie13 zetae
> 0.7461566 -0.3377820 -0.6207350 0.2847632 0.8828395
>
> Iterations = 75
> > semE$convergence
> [1] TRUE
What's remarkable is that you got estimates at all. The model is underidentified because even with the equality constraints on the parameters there are no normalizing constraints setting scales for the latent variables. As well, allowing all measurement error variables to be correlated between (among) the indicators of each latent variable underidentifies the model.
>
> I also tried with using SpecifyEquations() instead of SpecifyModel() :
> # specification of the model using specifyEquations
> modelEe <- specifyEquations()
> Fledgling_date_t = lambda1*EndBreed
> Total_Output_t = lambda1*EndBreed
> c(Fledgling_date_t,Total_Output_t) = theta1
> Breed_nb.clutch_t.1 = lamdae*BreedSucc
> Breed_Egg_t.1 = lamdae*BreedSucc
> Breed_Total_Output_t.1 = lamdae*BreedSucc
> c(Breed_nb.clutch_t.1,Breed_Egg_t.1) = psi12
> c(Breed_nb.clutch_t.1,Breed_Total_Output_t.1) = psi13
> c(Breed_Egg_t.1,Breed_Total_Output_t.1) = psi23
> BreedSucc = gamma1*EndBreed
> v(EndBreed) = phi1
> v(BreedSucc) = zeta1
> v(Fledgling_date_t) = delta1
> v(Total_Output_t) = delta2
> v(Breed_nb.clutch_t.1) = eps1
> v(Breed_Egg_t.1) = eps2
> v(Breed_Total_Output_t.1) = eps3
>
> # estimation of the model
> semEe <- sem(modelEe,covE,N=39,debug=T)
>
> > semEe
>
> Model Chisquare = 0.9876903 Df = 1
>
> lambda1 theta1 lamdae psi12 psi13 psi23
> gamma1 phi1 zeta1
> 0.8220182 -0.3346606 0.5034442 -0.3550646 0.2674806 -0.6380177
> -0.3694630 1.0135693 0.8326144
> delta1 delta2 eps1 eps2 eps3
> 0.3093554 0.3209828 0.7847537 0.7685137 0.7288741
>
> Iterations = 79
> > summary(semEe)
> Error in summary.objectiveML(semEe) :
> coefficient covariances cannot be computed
>
This is the same model. That you got identical chisquares but different parameter estimates points to underidentification. Again, I'm surprised that there was an apparent solution not that the information matrix isn't positive-definite.
> I also tried to set one loading to 1 instead of setting equality among
> loadings, but the results were the same.
That addresses one of the sources of underidentification but not the other. Actually, to be consistent with your prior specification you should have set *all* of the factor loadings to 1, which would both constrain them equal and set a metric for the latent variables -- but the resulting model would still (I believe) be underidentified by virtue of the correlations among the measurement errors.
>
> Could it be possible that the low number of data (N=39 but no NA
> inside) may be the cause of the error?
> In the model, the df is 1, thus all the parameters should be
> identifiable.
No, the small N probably makes it inadvisable to fit a SEM to these data but the input covariance matrix isn't poorly conditioned. Also, positive df doesn't guarantee an identified model -- it's a necessary but not sufficient condition for identification -- and it's very easy to specify a model with positive df that's still underidentified. Indeed, you easily succeeded in doing that.
>
> Hoping you will have enough information to help a bit.
I fiddled with your model a bit and produced the following (allowing different factor loadings for different indicators and using as a normalizing constraint that the variances of the latent variables are 1):
-------------- snip -------------
> modelEe.modified <- specifyEquations()
1: Fledgling_date_t = lambda1*EndBreed
2: Total_Output_t = lambda2*EndBreed
3: Breed_nb.clutch_t.1 = lamdae1*BreedSucc
4: Breed_Egg_t.1 = lamdae2*BreedSucc
5: Breed_Total_Output_t.1 = lamdae3*BreedSucc
6: BreedSucc = gamma1*EndBreed
7: V(EndBreed) = 1
8: V(BreedSucc) = 1
9:
Read 8 items
NOTE: adding 5 variances to the model
> semEe.modified <- sem(modelEe.modified, S.covE, N=39)
> summary(semEe.modified)
Model Chisquare = 1.998071 Df = 4 Pr(>Chisq) = 0.7361137
AIC = 23.99807
BIC = -12.65618
Normalized Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.14900 -0.36600 0.00000 -0.18790 0.08115 0.23380
R-square for Endogenous Variables
Fledgling_date_t Total_Output_t BreedSucc Breed_nb.clutch_t.1 Breed_Egg_t.1 Breed_Total_Output_t.1
0.3331 0.3681 0.0554 0.0805 0.0559 2.8772
Parameter Estimates
Estimate Std Error z value Pr(>|z|)
lambda1 0.5771328 0.2495813 2.3124037 2.075545e-02 Fledgling_date_t <--- EndBreed
lambda2 0.6067412 0.2570540 2.3603648 1.825697e-02 Total_Output_t <--- EndBreed
lamdae1 0.2757486 0.2421902 1.1385623 2.548858e-01 Breed_nb.clutch_t.1 <--- BreedSucc
lamdae2 -0.2297048 0.2136600 -1.0750952 2.823321e-01 Breed_Egg_t.1 <--- BreedSucc
lamdae3 1.6485964 1.3517870 1.2195682 2.226286e-01 Breed_Total_Output_t.1 <--- BreedSucc
gamma1 -0.2421404 0.2698327 -0.8973722 3.695203e-01 BreedSucc <--- EndBreed
V[Fledgling_date_t] 0.6669177 0.2778114 2.4006126 1.636765e-02 Fledgling_date_t <--> Fledgling_date_t
V[Total_Output_t] 0.6318651 0.2944414 2.1459791 3.187465e-02 Total_Output_t <--> Total_Output_t
V[Breed_nb.clutch_t.1] 0.9195045 0.2423910 3.7934763 1.485528e-04 Breed_nb.clutch_t.1 <--> Breed_nb.clutch_t.1
V[Breed_Egg_t.1] 0.9441415 0.2306422 4.0935328 4.248499e-05 Breed_Egg_t.1 <--> Breed_Egg_t.1
V[Breed_Total_Output_t.1] -1.8772242 4.4379049 -0.4229979 6.722968e-01 Breed_Total_Output_t.1 <--> Breed_Total_Output_t.1
Iterations = 34
-------------- snip -------------
Of course, if you really *believe* that the various measurement errors are correlated then these estimates are biased.
Some more comments:
(1) You don't have to supply error-variances for variables; specifyEquations() and specifyModel() will do that automatically by default (as you can see in my input/output above).
(2) If you have the original data set, as you do, it's better to use that rather than a moment matrix as input to sem().
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
>
> Thanks in advance.
>
> Cheers,
>
> Guillaume
>
> --
>
> Guillaume SOUCHAY, Ph.D
>
> Post-doctoral fellow in population dynamics
>
> ---
>
> "There is no true model" Anderson & Burhnam 1999
>
> ---
>
> ᐧ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list