A Series on SEM - Part 5

Date: Thu 2026-03-19

Permalink: https://www.dominic-ricottone.com/posts/2026/03/a-series-on-sem-part-5/


This is the final part of a series on structural equation modeling (SEM). Previously I came to the conclusion that replicating my reference model in lavaan will require declaring some covariances to be zero. This is essentially an inappropriate assumption that the variables are independent. There had already been some suspicion that Stata was making assumptions like this implicitly.

So if I’m to embark on a test like this, where to start with the covariance structure? How about… assume everything is independent?

mod3 <- '
Alpha =~ 1*wage_2013 + 1*wage_2014 + 1*wage_2015 + 1*wage_2016

wage_2013 ~ b1*age_2013 + b2*sq_age_2013 + b3*tenure_2013
wage_2014 ~ b1*age_2014 + b2*sq_age_2014 + b3*tenure_2014
wage_2015 ~ b1*age_2015 + b2*sq_age_2015 + b3*tenure_2015
wage_2016 ~ b1*age_2016 + b2*sq_age_2016 + b3*tenure_2016

Alpha ~ 0

# Just to document that *by definition* Alpha must be independent.
Alpha ~~
  + 0*age_2013 + 0*sq_age_2013 + 0*tenure_2013
  + 0*age_2014 + 0*sq_age_2014 + 0*tenure_2014
  + 0*age_2015 + 0*sq_age_2015 + 0*tenure_2015
  + 0*age_2016 + 0*sq_age_2016 + 0*tenure_2016

wage_2013 ~ b0*1
wage_2014 ~ b0*1
wage_2015 ~ b0*1
wage_2016 ~ b0*1

wage_2013 ~~ wage_2013
wage_2014 ~~ wage_2014
wage_2015 ~~ wage_2015
wage_2016 ~~ wage_2016
'

mod3.fit <- sem(model=mod3, data=df2, estimator="ML", meanstructure=TRUE, missing="ML")

mod3r.fit <- sem(model=mod3, data=df2, estimator="MLR", meanstructure=TRUE, missing="ML")

This however still does not yield a satisfactory result. Back in Stata, things magically got better when I switched from -sem- to -gsem-. Let’s try multilevel modeling in lavaan.

Well right away I run into an issue: lavaan cannot define a model like wage ~ age + sq_age + tenure + Alpha. The first appearance of Alpha must be on the LHS. This isn’t too big a deal however, since the cluster level equation is just between Alpha and wage. I can just reverse the direction of implied causation.

So here’s my second try at a multilevel model.

mod5 <- '
# Within-cluster level
level: 1
    wage ~ age + sq_age + tenure

    # Covariance structure
    age ~~ sq_age + tenure
    age ~~ tenure
# Cluster level
level: 2
    Alpha =~ 1*wage

    # E(Alpha) = 0
    Alpha ~ 0
'

mod5.fit <- sem(model=mod5, data=df1, estimator="ML", meanstructure=TRUE, missing=TRUE, cluster="personid")

This spits out a series of warnings:

Despite that last part, the model was successfully estimated. And actually quite close to the reference. Take a look:

Reference Model -gsem- Model 1 -gsem- Model 2 lavaan Model 5
N obs 1,928 1,928 1,928 2,400
N groups 589 589 600
intercept 7.6294 7.6528 7.6289 7.629
p<0.0001 p<0.0001 p<0.0001
age coef. 0.4860 0.4848 0.4860 0.486
p<0.0001 p<0.0001 p<0.0001
sq. age coef. -0.0032 -0.0032 -0.0032 -0.003
p<0.0001 p<0.0001 p<0.0001
tenure coef. 0.5889 0.5900 0.5888 0.589
p<0.0001 p<0.0001 p<0.0001
Var(ε) 4.2660 4.2660 4.266
Var(ε2013) 4.1397
Var(ε2014) 4.6110
Var(ε2015) 4.3479
Var(ε2016) 3.9745
Var(α) 2.1980 2.1868 2.1980 2.198
R-squared 0.6954

While we’re throwing crazy ideas at the wall, maybe floating point rounding errors are the problem! It seems possible that squared values are too large. So I scale it down by 100x (and drop the cluster with no variation in the outcome variable) and try again.

df3 <- df1[!(df1$personid %in% c(452)), ]
df3$sq_age <- df3$sq_age/100
mod5.fit2 <- sem(model=mod5, data=df3, estimator="ML", cluster="personid", meanstructure=TRUE)
Click here for the full execution log.

TODO: find a prettier way to include RMarkdown cells in my blog posts.

mod5.fit2 <- sem(model=mod5, data=df3, estimator="ML", cluster="personid", meanstructure=TRUE)
## Warning: lavaan->lav_data_full():
##    Level-1 variable "tenure" has no variance within some clusters . The
##    cluster ids with zero within variance are: 104, 134, 167, 198, 205, 245,
##    286, 342, 344, 353, 370, 372, 383, 418, 431, 438, 468, 499, 556, 560, 562.
## Warning in log(det(sigma.w)): NaNs produced
## Warning in log(det(sigma.w)): NaNs produced
summary(mod5.fit2, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)
## Warning: lavaan->lav_fit_cfi_lavobject():
##    computation of robust CFI resulted in NA values.
## Warning: lavaan->lav_fit_rmsea_lavobject():
##    computation of robust RMSEA resulted in NA values.
## lavaan 0.6-20 ended normally after 113 iterations
##
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
##
##   Number of observations                          2396
##   Number of clusters [personid]                    599
##   Number of missing patterns -- level 1              2
##
## Model Test User Model:
##
##   Test statistic                               258.439
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
##
## Model Test Baseline Model:
##
##   Test statistic                             10819.294
##   Degrees of freedom                                 6
##   P-value                                        0.000
##
## User Model versus Baseline Model:
##
##   Comparative Fit Index (CFI)                    0.976
##   Tucker-Lewis Index (TLI)                       0.857
##
##   Robust Comparative Fit Index (CFI)                NA
##   Robust Tucker-Lewis Index (TLI)                   NA
##
## Loglikelihood and Information Criteria:
##
##   Loglikelihood user model (H0)             -25703.936
##   Loglikelihood unrestricted model (H1)     -25574.717
##
##   Akaike (AIC)                               51435.872
##   Bayesian (BIC)                             51516.814
##   Sample-size adjusted Bayesian (SABIC)      51472.333
##
## Root Mean Square Error of Approximation:
##
##   RMSEA                                          0.328
##   90 Percent confidence interval - lower         0.295
##   90 Percent confidence interval - upper         0.362
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##
##   Robust RMSEA                                      NA
##   90 Percent confidence interval - lower            NA
##   90 Percent confidence interval - upper            NA
##   P-value H_0: Robust RMSEA <= 0.050                NA
##   P-value H_0: Robust RMSEA >= 0.080                NA
##
## Standardized Root Mean Square Residual (corr metric):
##
##   SRMR (within covariance matrix)                0.154
##   SRMR (between covariance matrix)               0.000
##
## Parameter Estimates:
##
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
##
##
## Level 1 [within]:
##
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   wage ~
##     age               0.486    0.040   12.205    0.000    0.486    1.627
##     sq_age           -0.322    0.044   -7.373    0.000   -0.322   -1.012
##     tenure            0.590    0.017   34.407    0.000    0.590    0.543
##
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   age ~~
##     sq_age          176.824    5.142   34.389    0.000  176.824    0.988
##     tenure            1.134    0.173    6.543    0.000    1.134    0.022
##
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     age              43.241    0.282  153.444    0.000   43.241    3.135
##     sq_age           20.627    0.265   77.843    0.000   20.627    1.590
##     tenure            5.571    0.077   71.905    0.000    5.571    1.469
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .wage              4.267    0.164   26.043    0.000    4.267    0.251
##     age             190.276    5.507   34.553    0.000  190.276    1.000
##     sq_age          168.235    4.861   34.612    0.000  168.235    1.000
##     tenure           14.384    0.416   34.612    0.000   14.384    1.000
##
## R-Square:
##                    Estimate
##     wage              0.749
##
##
## Level 2 [personid]:
##
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Alpha =~
##     wage              1.000                               1.476    1.000
##
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Alpha             0.000                               0.000    0.000
##    .wage              7.612    0.848    8.975    0.000    7.612    5.158
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .wage              0.000                               0.000    0.000
##     Alpha             2.178    0.211   10.313    0.000    1.000    1.000
##
## R-Square:
##                    Estimate
##     wage              1.000
mod5.fit2r <- sem(model=mod5, data=df3, estimator="MLR", meanstructure=TRUE, missing="ML", cluster="personid")
## Warning: lavaan->lav_data_full():
##    Level-1 variable "tenure" has no variance within some clusters . The
##    cluster ids with zero within variance are: 104, 134, 167, 198, 205, 245,
##    286, 342, 344, 353, 370, 372, 383, 418, 431, 438, 468, 499, 556, 560, 562.
## Warning in log(det(sigma.w)): NaNs produced
## Warning in log(det(sigma.w)): NaNs produced
summary(mod5.fit2r, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)
## Warning: lavaan->lav_fit_cfi_lavobject():
##    computation of robust CFI resulted in NA values.
## Warning: lavaan->lav_fit_rmsea_lavobject():
##    computation of robust RMSEA resulted in NA values.
## lavaan 0.6-20 ended normally after 113 iterations
##
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
##
##   Number of observations                          2396
##   Number of clusters [personid]                    599
##   Number of missing patterns -- level 1              2
##
## Model Test User Model:
##                                               Standard      Scaled
##   Test Statistic                               258.439          NA
##   Degrees of freedom                                 1           1
##   P-value (Chi-square)                           0.000          NA
##   Scaling correction factor                                     NA
##     Yuan-Bentler correction (Mplus variant)
##
## Model Test Baseline Model:
##
##   Test statistic                             10819.294    3514.718
##   Degrees of freedom                                 6           6
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  3.078
##
## User Model versus Baseline Model:
##
##   Comparative Fit Index (CFI)                    0.976          NA
##   Tucker-Lewis Index (TLI)                       0.857          NA
##
##   Robust Comparative Fit Index (CFI)                            NA
##   Robust Tucker-Lewis Index (TLI)                               NA
##
## Loglikelihood and Information Criteria:
##
##   Loglikelihood user model (H0)             -25703.936  -25703.936
##   Scaling correction factor                                 10.808
##       for the MLR correction
##   Loglikelihood unrestricted model (H1)     -25574.717  -25574.717
##   Scaling correction factor                                  2.630
##       for the MLR correction
##
##   Akaike (AIC)                               51435.872   51435.872
##   Bayesian (BIC)                             51516.814   51516.814
##   Sample-size adjusted Bayesian (SABIC)      51472.333   51472.333
##
## Root Mean Square Error of Approximation:
##
##   RMSEA                                          0.328       0.000
##   90 Percent confidence interval - lower         0.295       0.000
##   90 Percent confidence interval - upper         0.362       0.000
##   P-value H_0: RMSEA <= 0.050                    0.000          NA
##   P-value H_0: RMSEA >= 0.080                    1.000          NA
##
##   Robust RMSEA                                                  NA
##   90 Percent confidence interval - lower                        NA
##   90 Percent confidence interval - upper                        NA
##   P-value H_0: Robust RMSEA <= 0.050                            NA
##   P-value H_0: Robust RMSEA >= 0.080                            NA
##
## Standardized Root Mean Square Residual (corr metric):
##
##   SRMR (within covariance matrix)                0.154       0.154
##   SRMR (between covariance matrix)               0.000       0.000
##
## Parameter Estimates:
##
##   Standard errors                             Sandwich
##   Information bread                           Observed
##   Observed information based on                Hessian
##
##
## Level 1 [within]:
##
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   wage ~
##     age               0.486    0.421    1.155    0.248    0.486    1.627
##     sq_age           -0.322    0.445   -0.723    0.470   -0.322   -1.012
##     tenure            0.590    0.019   31.460    0.000    0.590    0.543
##
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   age ~~
##     sq_age          176.824    8.817   20.055    0.000  176.824    0.988
##     tenure            1.134    0.281    4.034    0.000    1.134    0.022
##
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     age              43.241    0.598   72.316    0.000   43.241    3.135
##     sq_age           20.627    0.555   37.172    0.000   20.627    1.590
##     tenure            5.571    0.139   40.028    0.000    5.571    1.469
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .wage              4.267    0.175   24.415    0.000    4.267    0.251
##     age             190.276    8.367   22.740    0.000  190.276    1.000
##     sq_age          168.235    9.494   17.719    0.000  168.235    1.000
##     tenure           14.384    0.452   31.850    0.000   14.384    1.000
##
## R-Square:
##                    Estimate
##     wage              0.749
##
##
## Level 2 [personid]:
##
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Alpha =~
##     wage              1.000                               1.476    1.000
##
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Alpha             0.000                               0.000    0.000
##    .wage              7.612    9.144    0.832    0.405    7.612    5.158
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .wage              0.000                               0.000    0.000
##     Alpha             2.178    0.212   10.295    0.000    1.000    1.000
##
## R-Square:
##                    Estimate
##     wage              1.000
Reference Model -gsem- Model 1 -gsem- Model 2 lavaan Model 5
N obs 1,928 1,928 1,928 1,926
N groups 589 589 588
intercept 7.6294 7.6528 7.6289 7.629
p<0.0001 p<0.0001 p<0.0001 p<0.0001
age coef. 0.4860 0.4848 0.4860 0.486
p<0.0001 p<0.0001 p<0.0001 p<0.0001
sq. age coef. -0.0032 -0.0032 -0.0032 -0.322
p<0.0001 p<0.0001 p<0.0001 p<0.0001
tenure coef. 0.5889 0.5900 0.5888 0.590
p<0.0001 p<0.0001 p<0.0001 p<0.0001
Var(ε) 4.2660 4.2660 4.267
Var(ε2013) 4.1397
Var(ε2014) 4.6110
Var(ε2015) 4.3479
Var(ε2016) 3.9745
Var(α) 2.1980 2.1868 2.1980 2.178
R-squared 0.6954

† = scaled by 0.01

Well I’ll be. On one hand, the R-squared is again preposterous. But the model has reproduced. This leaves me to ponder why rewriting a model to be multilevel makes it estimable without taking on inappropriate assumptions…


It’s at this point I remember how multilevel models take on assumptions that sometimes are not appropriate. I have accidentally erased serial correlation.

Let’s quickly double-check that.

df4 <- pivot_wider(df3, id_cols=personid, names_from=year, values_from=c(wage, age, sq_age, tenure))

mod6 <- '
Alpha =~ 1*wage_2013 + 1*wage_2014 + 1*wage_2015 + 1*wage_2016

wage_2013 ~ b1*age_2013 + b2*sq_age_2013 + b3*tenure_2013
wage_2014 ~ b1*age_2014 + b2*sq_age_2014 + b3*tenure_2014
wage_2015 ~ b1*age_2015 + b2*sq_age_2015 + b3*tenure_2015
wage_2016 ~ b1*age_2016 + b2*sq_age_2016 + b3*tenure_2016

Alpha ~ 0

Alpha ~~
  + 0*age_2013 + 0*sq_age_2013 + 0*tenure_2013
  + 0*age_2014 + 0*sq_age_2014 + 0*tenure_2014
  + 0*age_2015 + 0*sq_age_2015 + 0*tenure_2015
  + 0*age_2016 + 0*sq_age_2016 + 0*tenure_2016

age_2013 ~~ sq_age_2013 + tenure_2013
sq_age_2013 ~~ tenure_2013

age_2014 ~~ sq_age_2014 + tenure_2014
sq_age_2014 ~~ tenure_2014

age_2015 ~~ sq_age_2015 + tenure_2015
sq_age_2015 ~~ tenure_2015

age_2016 ~~ sq_age_2016 + tenure_2016
sq_age_2016 ~~ tenure_2016

wage_2013 ~ b0*1
wage_2014 ~ b0*1
wage_2015 ~ b0*1
wage_2016 ~ b0*1

wage_2013 ~~ wage_2013
wage_2014 ~~ wage_2014
wage_2015 ~~ wage_2015
wage_2016 ~~ wage_2016
'

mod6.fit <- sem(model=mod6, data=df4, estimator="ML", meanstructure=TRUE, missing="ML")
Click here for the full execution log.
mod6.fit <- sem(model=mod6, data=df4, estimator="ML", meanstructure=TRUE, missing="ML")
## Warning: lavaan->lav_data_full():
##    some observed variables are perfectly correlated; please check your data;
##    variables involved are: age_2013 age_2014 age_2015 age_2016
## Warning: lavaan->lav_mvnorm_missing_h1_estimate_moments():
##    Maximum number of iterations reached when computing the sample moments
##    using EM; use the em.h1.iter.max= argument to increase the number of
##    iterations
## Warning: lavaan->lav_mvnorm_missing_h1_estimate_moments():
##    The smallest eigenvalue of the EM estimated variance-covariance matrix
##    (Sigma) is smaller than 1e-05; this may cause numerical instabilities;
##    interpret the results with caution.
summary(mod6.fit, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE)
## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced
## Warning in pchisq(X2, df = df, ncp = ncp): NaNs produced
## lavaan 0.6-20 ended normally after 301 iterations
##
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        57
##   Number of equality constraints                    12
##
##   Number of observations                           599
##   Number of missing patterns                        16
##
## Model Test User Model:
##
##   Test statistic                              64527.000
##   Degrees of freedom                                107
##   P-value (Chi-square)                            0.000
##
## Model Test Baseline Model:
##
##   Test statistic                             76636.624
##   Degrees of freedom                               120
##   P-value                                        0.000
##
## User Model versus Baseline Model:
##
##   Comparative Fit Index (CFI)                    0.158
##   Tucker-Lewis Index (TLI)                       0.056
##
##   Robust Comparative Fit Index (CFI)             0.011
##   Robust Tucker-Lewis Index (TLI)               -0.109
##
## Loglikelihood and Information Criteria:
##
##   Loglikelihood user model (H0)             -25339.140
##   Loglikelihood unrestricted model (H1)       6924.360
##
##   Akaike (AIC)                               50768.280
##   Bayesian (BIC)                             50966.066
##   Sample-size adjusted Bayesian (SABIC)      50823.204
##
## Root Mean Square Error of Approximation:
##
##   RMSEA                                          1.003
##   90 Percent confidence interval - lower         0.996
##   90 Percent confidence interval - upper         1.009
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    1.000
##
##   Robust RMSEA                                  97.894
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value H_0: Robust RMSEA <= 0.050               NaN
##   P-value H_0: Robust RMSEA >= 0.080               NaN
##
## Standardized Root Mean Square Residual:
##
##   SRMR                                           0.558
##
## Parameter Estimates:
##
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
##
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Alpha =~
##     wage_2013         1.000                               1.472    0.312
##     wage_2014         1.000                               1.472    0.305
##     wage_2015         1.000                               1.472    0.303
##     wage_2016         1.000                               1.472    0.313
##
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   wage_2013 ~
##     age_2013  (b1)    0.485    0.040   12.194    0.000    0.485    1.425
##     sq_g_2013 (b2)   -0.321    0.044   -7.364    0.000   -0.321   -0.852
##     tenr_2013 (b3)    0.592    0.017   34.429    0.000    0.592    0.328
##   wage_2014 ~
##     age_2014  (b1)    0.485    0.040   12.194    0.000    0.485    1.398
##     sq_g_2014 (b2)   -0.321    0.044   -7.364    0.000   -0.321   -0.854
##     tenr_2014 (b3)    0.592    0.017   34.429    0.000    0.592    0.457
##   wage_2015 ~
##     age_2015  (b1)    0.485    0.040   12.194    0.000    0.485    1.383
##     sq_g_2015 (b2)   -0.321    0.044   -7.364    0.000   -0.321   -0.863
##     tenr_2015 (b3)    0.592    0.017   34.429    0.000    0.592    0.517
##   wage_2016 ~
##     age_2016  (b1)    0.485    0.040   12.194    0.000    0.485    1.414
##     sq_g_2016 (b2)   -0.321    0.044   -7.364    0.000   -0.321   -0.901
##     tenr_2016 (b3)    0.592    0.017   34.429    0.000    0.592    0.530
##
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Alpha ~~
##     age_2013          0.000                               0.000    0.000
##     sq_age_2013       0.000                               0.000    0.000
##     tenure_2013       0.000                               0.000    0.000
##     age_2014          0.000                               0.000    0.000
##     sq_age_2014       0.000                               0.000    0.000
##     tenure_2014       0.000                               0.000    0.000
##     age_2015          0.000                               0.000    0.000
##     sq_age_2015       0.000                               0.000    0.000
##     tenure_2015       0.000                               0.000    0.000
##     age_2016          0.000                               0.000    0.000
##     sq_age_2016       0.000                               0.000    0.000
##     tenure_2016       0.000                               0.000    0.000
##   age_2013 ~~
##     sq_age_2013     171.662   10.005   17.158    0.000  171.662    0.988
##     tenure_2013      22.422    1.747   12.837    0.000   22.422    0.618
##   sq_age_2013 ~~
##     tenure_2013      19.129    1.556   12.297    0.000   19.129    0.583
##   age_2014 ~~
##     sq_age_2014     177.180   10.421   17.001    0.000  177.180    0.989
##     tenure_2014      17.226    2.252    7.649    0.000   17.226    0.332
##   sq_age_2014 ~~
##     tenure_2014      15.048    2.070    7.269    0.000   15.048    0.314
##   age_2015 ~~
##     sq_age_2015     179.431   10.457   17.160    0.000  179.431    0.989
##     tenure_2015      17.021    2.510    6.781    0.000   17.021    0.289
##   sq_age_2015 ~~
##     tenure_2015      15.690    2.365    6.633    0.000   15.690    0.282
##   age_2016 ~~
##     sq_age_2016     179.602   10.258   17.508    0.000  179.602    0.989
##     tenure_2016      15.163    2.420    6.266    0.000   15.163    0.262
##   sq_age_2016 ~~
##     tenure_2016      14.030    2.326    6.031    0.000   14.030    0.252
##
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Alpha             0.000                               0.000    0.000
##    .wage_2013 (b0)    7.636    0.847    9.015    0.000    7.636    1.618
##    .wage_2014 (b0)    7.636    0.847    9.015    0.000    7.636    1.580
##    .wage_2015 (b0)    7.636    0.847    9.015    0.000    7.636    1.570
##    .wage_2016 (b0)    7.636    0.847    9.015    0.000    7.636    1.622
##     age_2013         41.741    0.566   73.707    0.000   41.741    3.012
##     sq_g_2013        19.340    0.512   37.759    0.000   19.340    1.543
##     tenr_2013         6.229    0.107   58.202    0.000    6.229    2.378
##     age_2014         42.741    0.569   75.119    0.000   42.741    3.069
##     sq_g_2014        20.185    0.526   38.385    0.000   20.185    1.568
##     tenr_2014         5.506    0.152   36.138    0.000    5.506    1.477
##     age_2015         43.741    0.566   77.220    0.000   43.741    3.155
##     sq_g_2015        21.049    0.535   39.365    0.000   21.049    1.608
##     tenr_2015         4.873    0.174   28.063    0.000    4.873    1.147
##     age_2016         44.741    0.561   79.781    0.000   44.741    3.260
##     sq_g_2016        21.934    0.540   40.583    0.000   21.934    1.658
##     tenr_2016         5.678    0.172   32.957    0.000    5.678    1.347
##
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .wage_2013         4.144    0.333   12.429    0.000    4.144    0.186
##    .wage_2014         4.616    0.359   12.849    0.000    4.616    0.198
##    .wage_2015         4.355    0.347   12.536    0.000    4.355    0.184
##    .wage_2016         3.964    0.324   12.237    0.000    3.964    0.179
##     age_2013        192.107   11.128   17.263    0.000  192.107    1.000
##     sq_age_2013     157.145    9.103   17.263    0.000  157.145    1.000
##     tenure_2013       6.860    0.397   17.291    0.000    6.860    1.000
##     age_2014        193.922   11.340   17.101    0.000  193.922    1.000
##     sq_age_2014     165.634    9.686   17.101    0.000  165.634    1.000
##     tenure_2014      13.904    0.805   17.279    0.000   13.904    1.000
##     age_2015        192.197   11.138   17.256    0.000  192.197    1.000
##     sq_age_2015     171.264    9.925   17.256    0.000  171.264    1.000
##     tenure_2015      18.063    1.044   17.301    0.000   18.063    1.000
##     age_2016        188.384   10.702   17.602    0.000  188.384    1.000
##     sq_age_2016     174.981    9.941   17.602    0.000  174.981    1.000
##     tenure_2016      17.778    1.026   17.334    0.000   17.778    1.000
##     Alpha             2.165    0.211   10.286    0.000    1.000    1.000
##
## R-Square:
##                    Estimate
##     wage_2013         0.814
##     wage_2014         0.802
##     wage_2015         0.816
##     wage_2016         0.821

Leaving aside the warning that a greater number of iterations is required to converge… since this exercise is purely to test the theory that serial convergence is present… here’s the final table.

Reference Model -gsem- Model 1 -gsem- Model 2 lavaan Model 5 lavaan Model 6
N obs 1,928 1,928 1,928 1,926 599
N groups 589 589 588
intercept 7.6294 7.6528 7.6289 7.629 7.636
p<0.0001 p<0.0001 p<0.0001 p<0.0001 p<0.0001
age coef. 0.4860 0.4848 0.4860 0.486 0.485
p<0.0001 p<0.0001 p<0.0001 p<0.0001 p<0.0001
sq. age coef. -0.0032 -0.0032 -0.0032 -0.322 -0.321
p<0.0001 p<0.0001 p<0.0001 p<0.0001 p<0.0001
tenure coef. 0.5889 0.5900 0.5888 0.590 0.592
p<0.0001 p<0.0001 p<0.0001 p<0.0001 p<0.0001
Var(ε) 4.2660 4.2660 4.267
Var(ε2013) 4.1397 4.144
Var(ε2014) 4.6110 4.616
Var(ε2015) 4.3479 4.355
Var(ε2016) 3.9745 3.964
Var(α) 2.1980 2.1868 2.1980 2.178 2.165
R-squared 0.6954
R-squared (2013) 0.814
R-squared (2014) 0.802
R-squared (2015) 0.816
R-squared (2016) 0.821

So indeed I have a disappointment sandwich. But the fault lies in my random effects model that I took as a reference, not in my self-taught structural equation model. Serial correlation is a failure of assumptions that exists completely in the blind spot of random effects estimation. lavaan meanwhile screamed that I was wrong from the very beginning. Frank Harrell wrote about this specific issue. He advises: “Above all, don’t default to only using random intercepts to handle within-subject correlations of serial measurements. This is unlikely to fit the correlation structure in play.”


Now, does -gsem- make inappropriate assumptions? Maybe. Can’t conclude either way with this example. Might come back to that investigation eventually. But let’s consider this series on SEM finished.


Previous article: LinkedIn Project URLs