[R] Fitting Structural Equation Model with sem package and Summary issues

Fri Dec 19 17:27:40 CET 2014

Dear Guillaume,

Please see comments interspersed below:

> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
> Guillaume Souchay
> Sent: Thursday, December 18, 2014 8:50 AM
> To: r-help at r-project.org
> Subject: [R] Fitting Structural Equation Model with sem package and
> Summary issues
> 
> Hi all,
> 
> I am trying to analyse bird data to investigate carry-over effect
> using structural equation model.
> I failed to run properly a big model with several latent variables
> with both L -> M block and M -> L block.
> Rather than trying again and again with the huge model, I am now
> looking to a subset of the model.
> 
> Due to previous error message (singularity in the matrix), I scaled
> all the variables.
> Here is a subset of the data:
> > dataE[1:15,]
>    Fledgling_date_t Total_Output_t Breed_nb.clutch_t.1 Breed_Egg_t.1
> Breed_Total_Output_t.1
> 1        1.09397971     1.19657515           0.4696909   -0.69784742
>            1.2558119
> 2        0.62564592     0.37786584           0.4696909    0.02046473
>           -0.1762543
> 3        1.51548013     1.19657515          -1.0568046    0.89840181
>           -1.8947338
> 4        0.15731212     1.60592981          -1.0568046   -1.49597204
>           -1.3219073
> 5        0.48514578    -0.44084348           0.4696909   -0.69784742
>            0.6829854
> 6        1.93698054    -0.03148882           0.4696909    0.02046473
>            0.9693987
> 7       -1.66918968    -0.85019813          -1.0568046    0.97821428
>           -0.4626676
> 8        0.01681198     0.78722049           0.4696909   -0.53822250
>            1.2558119
> 9        0.34464564     1.60592981          -1.0568046   -0.13916019
>           -1.8947338
> 10       1.23447985     1.19657515          -1.0568046    1.93596382
>           -0.7490808
> 11      -0.12368816    -0.85019813           0.4696909    1.93596382
>            0.1101589
> 12      -0.17052154     0.78722049          -1.0568046   -0.45841004
>           -0.1762543
> 13      -1.52868954    -0.44084348          -1.0568046    1.37727658
>           -0.4626676
> 14       0.15731212    -1.25955279          -1.0568046    1.61671397
>           -0.4626676
> 15      -0.17052154    -0.85019813          -1.0568046    0.97821428
>           -1.6083205
> 
> Library(sem)
> # the covariance matrix for scaled data
> S.covE <-
> readMoments(diag=T,names=c("Fledgling_date_t","Total_Output_t","Breed_nb
> .clutch_t.1","Breed_Egg_t.1","Breed_Total_Output_t.1"))
> 1.0000000
> 0.350170246 1.0000000
> -0.075832501 -0.099929893 1.0000000
> -0.15439341 -0.091334987 -0.131698418 1.0000000
> -0.191457491 -0.227843749 0.510666663 -0.386711653 1.0000000
> 
> # specification of the model - I also provided a diagram of the model
> in the attached PDF.
> modelE <- specifyModel()
> EndBreed -> Fledgling_date_t,                       lambda1,   NA
> EndBreed -> Total_Output_t,                         lambda1,   NA
> Fledgling_date_t <-> Fledgling_date_t,              delta1,    NA
> Total_Output_t <-> Total_Output_t,                  delta2,    NA
> Fledgling_date_t <-> Total_Output_t,                theta1,    NA
> EndBreed -> BreedSucc,                              gamma1,    NA
> EndBreed <-> EndBreed,                              phi1,      NA
> BreedSucc -> Breed_Egg_t.1,                         lamdae,    NA
> BreedSucc -> Breed_Total_Output_t.1,                lamdae,    NA
> BreedSucc -> Breed_nb.clutch_t.1,                   lamdae,    NA
> Breed_nb.clutch_t.1 <-> Breed_nb.clutch_t.1,        eps1,      NA
> Breed_Egg_t.1 <-> Breed_Egg_t.1,                    eps2,      NA
> Breed_Total_Output_t.1 <-> Breed_Total_Output_t.1,  eps3,      NA
> Breed_nb.clutch_t.1 <-> Breed_Egg_t.1,              psie12,    NA
> Breed_Egg_t.1 <-> Breed_Total_Output_t.1,           psie23,    NA
> Breed_nb.clutch_t.1 <-> Breed_Total_Output_t.1,     psie13,    NA
> BreedSucc <-> BreedSucc,                            zetae,     NA
> 
> # estimation of the model
> semE <- sem(modelE,S.covE,N=39,debug=T)
> 
> To this point, everything seemed fine, the parameter were estimated
> after 129 iterations with all data.
> However, the problem arised when I asked for a summary of the model:
> 
> > summary(semE)
> Error in summary.objectiveML(semE) :
>   coefficient covariances cannot be computed
> 
> But the model seemed to work well :
> 
> > semE
> 
>  Model Chisquare =  0.9876903   Df =  1
> 
>    lambda1     delta1     delta2     theta1     gamma1       phi1
> lamdae       eps1       eps2
>  0.8251654  0.3302009  0.3418300 -0.3138143  0.4122545  0.9752364
> -0.4671335  0.8020365  0.7857964
>       eps3     psie12     psie23     psie13      zetae
>  0.7461566 -0.3377820 -0.6207350  0.2847632  0.8828395
> 
>  Iterations =  75
> > semE$convergence
> [1] TRUE

What's remarkable is that you got estimates at all. The model is underidentified because even with the equality constraints on the parameters there are no normalizing constraints setting scales for the latent variables. As well, allowing all measurement error variables to be correlated between (among) the indicators of each latent variable underidentifies the model.

> 
> I also tried with using SpecifyEquations() instead of SpecifyModel() :
> # specification of the model using specifyEquations
> modelEe <- specifyEquations()
> Fledgling_date_t = lambda1*EndBreed
> Total_Output_t = lambda1*EndBreed
> c(Fledgling_date_t,Total_Output_t) = theta1
> Breed_nb.clutch_t.1 = lamdae*BreedSucc
> Breed_Egg_t.1 = lamdae*BreedSucc
> Breed_Total_Output_t.1 = lamdae*BreedSucc
> c(Breed_nb.clutch_t.1,Breed_Egg_t.1) = psi12
> c(Breed_nb.clutch_t.1,Breed_Total_Output_t.1) = psi13
> c(Breed_Egg_t.1,Breed_Total_Output_t.1) = psi23
> BreedSucc = gamma1*EndBreed
> v(EndBreed) = phi1
> v(BreedSucc) = zeta1
> v(Fledgling_date_t) = delta1
> v(Total_Output_t) = delta2
> v(Breed_nb.clutch_t.1) = eps1
> v(Breed_Egg_t.1) = eps2
> v(Breed_Total_Output_t.1) = eps3
> 
> # estimation of the model
> semEe <- sem(modelEe,covE,N=39,debug=T)
> 
> > semEe
> 
>  Model Chisquare =  0.9876903   Df =  1
> 
>    lambda1     theta1     lamdae      psi12      psi13      psi23
> gamma1       phi1      zeta1
>  0.8220182 -0.3346606  0.5034442 -0.3550646  0.2674806 -0.6380177
> -0.3694630  1.0135693  0.8326144
>     delta1     delta2       eps1       eps2       eps3
>  0.3093554  0.3209828  0.7847537  0.7685137  0.7288741
> 
>  Iterations =  79
> > summary(semEe)
> Error in summary.objectiveML(semEe) :
>   coefficient covariances cannot be computed
> 

This is the same model. That you got identical chisquares but different parameter estimates points to underidentification. Again, I'm surprised that there was an apparent solution not that the information matrix isn't positive-definite.

> I also tried to set one loading to 1 instead of setting equality among
> loadings, but the results were the same.

That addresses one of the sources of underidentification but not the other. Actually, to be consistent with your prior specification you should have set *all* of the factor loadings to 1, which would both constrain them equal and set a metric for the latent variables -- but the resulting model would still (I believe) be underidentified by virtue of the correlations among the measurement errors.

> 
> Could it be possible that the low number of data (N=39 but no NA
> inside) may be the cause of the error?
> In the model, the df is 1, thus all the parameters should be
> identifiable.

No, the small N probably makes it inadvisable to fit a SEM to these data but the input covariance matrix isn't poorly conditioned. Also, positive df doesn't guarantee an identified model -- it's a necessary but not sufficient condition for identification -- and it's very easy to specify a model with positive df that's still underidentified. Indeed, you easily succeeded in doing that.

> 
> Hoping you will have enough information to help a bit.

I fiddled with your model a bit and produced the following (allowing different factor loadings for different indicators and using as a normalizing constraint that the variances of the latent variables are 1):

-------------- snip -------------

> modelEe.modified <- specifyEquations()
1: Fledgling_date_t = lambda1*EndBreed
2: Total_Output_t = lambda2*EndBreed
3: Breed_nb.clutch_t.1 = lamdae1*BreedSucc
4: Breed_Egg_t.1 = lamdae2*BreedSucc
5: Breed_Total_Output_t.1 = lamdae3*BreedSucc
6: BreedSucc = gamma1*EndBreed
7: V(EndBreed) = 1
8: V(BreedSucc) = 1
9: 
Read 8 items
NOTE: adding 5 variances to the model

> semEe.modified <- sem(modelEe.modified, S.covE, N=39)
> summary(semEe.modified)

 Model Chisquare =  1.998071   Df =  4 Pr(>Chisq) = 0.7361137
 AIC =  23.99807
 BIC =  -12.65618

 Normalized Residuals
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.14900 -0.36600  0.00000 -0.18790  0.08115  0.23380 

 R-square for Endogenous Variables
      Fledgling_date_t         Total_Output_t              BreedSucc    Breed_nb.clutch_t.1          Breed_Egg_t.1 Breed_Total_Output_t.1 
                0.3331                 0.3681                 0.0554                 0.0805                 0.0559                 2.8772 

 Parameter Estimates
                          Estimate   Std Error z value    Pr(>|z|)                                                       
lambda1                    0.5771328 0.2495813  2.3124037 2.075545e-02 Fledgling_date_t <--- EndBreed                    
lambda2                    0.6067412 0.2570540  2.3603648 1.825697e-02 Total_Output_t <--- EndBreed                      
lamdae1                    0.2757486 0.2421902  1.1385623 2.548858e-01 Breed_nb.clutch_t.1 <--- BreedSucc                
lamdae2                   -0.2297048 0.2136600 -1.0750952 2.823321e-01 Breed_Egg_t.1 <--- BreedSucc                      
lamdae3                    1.6485964 1.3517870  1.2195682 2.226286e-01 Breed_Total_Output_t.1 <--- BreedSucc             
gamma1                    -0.2421404 0.2698327 -0.8973722 3.695203e-01 BreedSucc <--- EndBreed                           
V[Fledgling_date_t]        0.6669177 0.2778114  2.4006126 1.636765e-02 Fledgling_date_t <--> Fledgling_date_t            
V[Total_Output_t]          0.6318651 0.2944414  2.1459791 3.187465e-02 Total_Output_t <--> Total_Output_t                
V[Breed_nb.clutch_t.1]     0.9195045 0.2423910  3.7934763 1.485528e-04 Breed_nb.clutch_t.1 <--> Breed_nb.clutch_t.1      
V[Breed_Egg_t.1]           0.9441415 0.2306422  4.0935328 4.248499e-05 Breed_Egg_t.1 <--> Breed_Egg_t.1                  
V[Breed_Total_Output_t.1] -1.8772242 4.4379049 -0.4229979 6.722968e-01 Breed_Total_Output_t.1 <--> Breed_Total_Output_t.1

 Iterations =  34 

-------------- snip -------------

Of course, if you really *believe* that the various measurement errors are correlated then these estimates are biased.

Some more comments:

(1) You don't have to supply error-variances for variables; specifyEquations() and specifyModel() will do that automatically by default (as you can see in my input/output above).

(2) If you have the original data set, as you do, it's better to use that rather than a moment matrix as input to sem().

Best,
 John

-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/

> 
> Thanks in advance.
> 
> Cheers,
> 
> Guillaume
> 
> --
> 
> Guillaume SOUCHAY, Ph.D
> 
> Post-doctoral fellow in population dynamics
> 
> ---
> 
> "There is no true model" Anderson & Burhnam 1999
> 
> ---
> 
> ᐧ
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.