A Series on SEM - Part 2

Date: Thu 2025-12-04

Permalink: https://www.dominic-ricottone.com/posts/2025/12/a-series-on-sem-part-2/


This is part of a series on structural equation modeling (SEM). Specifically, bumbling around with SEM to try and fit a random effects regression, because people smarter than I have said they can be equivalent.

At the close of part 1, I had collected the following results:

Reference Model -sem- Model -gsem- Model
N obs 1,928 324 1,928
N groups 589 - -
intercept 7.6294 8.4902 7.6289
p<0.0001 p<0.0001 p<0.0001
age coef. 0.4860 0.4515 0.4860
p<0.0001 p<0.0001 p<0.0001
sq. age coef. -0.0032 -0.0027 -0.0032
p<0.0001 p<0.0001 p<0.0001
tenure coef. 0.5889 0.5923 0.5888
p<0.0001 p<0.0001 p<0.0001
Var(ε) 4.2660 4.2388 4.2660
Var(α) 2.1980 2.2709 2.1980
R-squared 0.6954 - -

There’s a notable gap here; no R-squared is listed for the SEM models. Why did I forget to include that? Well…

. estimates restore m_sem
(results m_sem are active now)

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
       m_sem |        324          .  -22012.68       6   44037.36   44060.04
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

. estat gof, stats(all)

----------------------------------------------------------------------------
Fit statistic        |      Value   Description
---------------------+------------------------------------------------------
                     |
          chi2_ms(.) |          .   model vs. saturated
            p > chi2 |          .
          chi2_bs(.) |          .   baseline vs. saturated
            p > chi2 |          .
---------------------+------------------------------------------------------
Population error     |
               RMSEA |          .   Root mean squared error of approximation
 90% CI, lower bound |      0.000
         upper bound |          .
              pclose |          .   Probability RMSEA <= 0.05
---------------------+------------------------------------------------------
Information criteria |
                 AIC |  44037.359   Akaike's information criterion
                 BIC |  44060.044   Bayesian information criterion
---------------------+------------------------------------------------------
Baseline comparison  |
                 CFI |          .   Comparative fit index
                 TLI |          .   Tucker–Lewis index
---------------------+------------------------------------------------------
Size of residuals    |
                SRMR |      0.025   Standardized root mean squared residual
                  CD |      0.974   Coefficient of determination
----------------------------------------------------------------------------

There are some red flags here. Besides being different, an R-squared statistic of 97% is preposterous. And the inability to calculate a chi-squared test suggests that the model was not correctly identified. Did I accidentally regress wage on itself?

Referencing the Stata manual, there are recommendations to use a different post-estimation command. estat eqgof reports mc-squared statistics in addition to R-squared, and these should be preferred in non-recursive model. (And I do believe this qualifies as a non-recursive model.)

. estat eqgof

Equation-level goodness of fit

------------------------------------------------------------------------------
   Dependent |             Variance            |
   variables |    Fitted  Predicted   Residual | R-squared        mc       mc2
-------------+---------------------------------+------------------------------
Observed     |                                 |
    wage2013 |  17.72366   13.48486   4.238807 |  .7608391  .8722609  .7608391
    wage2014 |  19.45075   15.21195   4.238807 |  .7820749    .88435  .7820749
    wage2015 |  20.08207   15.84326   4.238807 |  .7889258  .8882149  .7889258
    wage2016 |  19.77338   15.53457   4.238807 |  .7856306  .8863581  .7856306
-------------+---------------------------------+------------------------------
     Overall |                                 |   .974397
------------------------------------------------------------------------------
mc  = Correlation between dependent variable and its prediction.
mc2 = mc^2 is the Bentler–Raykov squared multiple correlation coefficient.

A measure around 78% seems more palateable. So maybe no cause for concern.

What about -gsem-? Well, unfortunately very few post-estimation commands are actually supported right now.

. estimates restore m_gsem
(results m_gsem are active now)

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
      m_gsem |      1,928          .  -4419.283       6   8850.565   8883.951
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

Once again, Stata can’t estimate the log likelihood of the null model…

As a result of red flags in the -sem- model, general unavailability for fit statistics in the -gsem- model, and the aforementioned thread on Statalist hinting at inappropriate assumptions in the implementation, I am left with few options beyond re-implementing the model in another framework.

Setting Stata aside, the two most popular frameworks for fitting a SEM seem to be Mplus and lavaan. You may recognize that first name; the exceedingly brilliant and incomprehensible Bengt O. Muthén is also the co-creator of this program. Unfortunately Mplus is far from free software, and my company does not have a license. So lavaan it will be!


Previous article: A Series on SEM - Part 1