This is part of a series on structural equation modeling (SEM). Specifically, bumbling around with SEM to try and fit a random effects regression, because people smarter than I have said they can be equivalent.
At the close of part 1, I had collected the following results:
| Reference Model | -sem- Model |
-gsem- Model |
|
|---|---|---|---|
| N obs | 1,928 | 324 | 1,928 |
| N groups | 589 | - | - |
| intercept | 7.6294 | 8.4902 | 7.6289 |
| p<0.0001 | p<0.0001 | p<0.0001 | |
| age coef. | 0.4860 | 0.4515 | 0.4860 |
| p<0.0001 | p<0.0001 | p<0.0001 | |
| sq. age coef. | -0.0032 | -0.0027 | -0.0032 |
| p<0.0001 | p<0.0001 | p<0.0001 | |
| tenure coef. | 0.5889 | 0.5923 | 0.5888 |
| p<0.0001 | p<0.0001 | p<0.0001 | |
| Var(ε) | 4.2660 | 4.2388 | 4.2660 |
| Var(α) | 2.1980 | 2.2709 | 2.1980 |
| R-squared | 0.6954 | - | - |
There’s a notable gap here; no R-squared is listed for the SEM models. Why did I forget to include that? Well…
. estimates restore m_sem
(results m_sem are active now)
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
m_sem | 324 . -22012.68 6 44037.36 44060.04
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
. estat gof, stats(all)
----------------------------------------------------------------------------
Fit statistic | Value Description
---------------------+------------------------------------------------------
|
chi2_ms(.) | . model vs. saturated
p > chi2 | .
chi2_bs(.) | . baseline vs. saturated
p > chi2 | .
---------------------+------------------------------------------------------
Population error |
RMSEA | . Root mean squared error of approximation
90% CI, lower bound | 0.000
upper bound | .
pclose | . Probability RMSEA <= 0.05
---------------------+------------------------------------------------------
Information criteria |
AIC | 44037.359 Akaike's information criterion
BIC | 44060.044 Bayesian information criterion
---------------------+------------------------------------------------------
Baseline comparison |
CFI | . Comparative fit index
TLI | . Tucker–Lewis index
---------------------+------------------------------------------------------
Size of residuals |
SRMR | 0.025 Standardized root mean squared residual
CD | 0.974 Coefficient of determination
----------------------------------------------------------------------------
There are some red flags here. Besides being different, an R-squared statistic of 97% is preposterous. And the inability to calculate a chi-squared test suggests that the model was not correctly identified. Did I accidentally regress wage on itself?
Referencing the
Stata manual,
there are recommendations to use a different post-estimation command.
estat eqgof reports mc-squared statistics in addition to R-squared,
and these should be preferred in non-recursive model.
(And I do believe this qualifies as a non-recursive model.)
. estat eqgof
Equation-level goodness of fit
------------------------------------------------------------------------------
Dependent | Variance |
variables | Fitted Predicted Residual | R-squared mc mc2
-------------+---------------------------------+------------------------------
Observed | |
wage2013 | 17.72366 13.48486 4.238807 | .7608391 .8722609 .7608391
wage2014 | 19.45075 15.21195 4.238807 | .7820749 .88435 .7820749
wage2015 | 20.08207 15.84326 4.238807 | .7889258 .8882149 .7889258
wage2016 | 19.77338 15.53457 4.238807 | .7856306 .8863581 .7856306
-------------+---------------------------------+------------------------------
Overall | | .974397
------------------------------------------------------------------------------
mc = Correlation between dependent variable and its prediction.
mc2 = mc^2 is the Bentler–Raykov squared multiple correlation coefficient.
A measure around 78% seems more palateable. So maybe no cause for concern.
What about -gsem-?
Well,
unfortunately very few post-estimation commands are actually supported right
now.
. estimates restore m_gsem
(results m_gsem are active now)
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
m_gsem | 1,928 . -4419.283 6 8850.565 8883.951
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Once again, Stata can’t estimate the log likelihood of the null model…
As a result of red flags in the -sem- model,
general unavailability for fit statistics in the -gsem- model,
and the aforementioned
thread on Statalist
hinting at inappropriate assumptions in the implementation,
I am left with few options beyond re-implementing the model in another
framework.
Setting Stata aside, the two most popular frameworks for fitting a SEM seem to be Mplus and lavaan. You may recognize that first name; the exceedingly brilliant and incomprehensible Bengt O. Muthén is also the co-creator of this program. Unfortunately Mplus is far from free software, and my company does not have a license. So lavaan it will be!