Latent growth modeling is a specific instance of a more general approach to creating models with a mix of observed and latent variables. This approach goes by many names, including latent variable modeling, structural equation modeling, covariance structure analysis, and many other similar terms.
Latent variables are not observed directly but are inferred from the patterns of relationships among observed variables. For example, suppose we see that three different vocabulary tests, v1, v2, v3, have the correlations in Table 1.
Table 1: Correlations Among Three Vocabulary Tests
What underlying model would create these correlations? Unfortunately, any set of observed correlations could be generated from an infinite number of models. Of the infinite number of models that could generate these correlations, most of them have no theoretical basis and are stuffed with unnecessary complications. When deciding among alternatives, a good place to start is to imagine the simplest theoretically plausible explanation. As Einstein is reputed to have said,
“Everything should be made as simple as possible, but not simpler.”
Figure 1: Latent Variable Model for Three Vocabulary Tests
In this case, the simplest theoretically plausible model I can imagine is that people have an overall fund of knowledge about words in a particular language and that each vocabulary test score samples this knowledge imperfectly. In Figure 1, the circle labeled Vocabulary is a latent variable that represents the influence of an overall knowledge of words. It is latent because there is no practical way to measure a person’s overall knowledge of words. At best, we create tests that sample this overall knowledge with a subset of words we believe are representative of the overall word knowledge. Observed variables are represented in Figure 1 as squares. People with a large overall vocabulary are likely to score well on the three measures of vocabulary we have created. People with a small vocabulary will likely score lower on the three tests.
Sidenote: Is the true model that generated the correlation matrix in Table 1 more complex than the simple model in Figure 1? Undoubtedly. We generate simple models as a starting point in our investigation, as a basis of comparison for subsequent models.
The influence of the latent variable on the observed indicator variable is symbolized by a straight arrow. The strength of the influence is called a loading. Observed variables with stronger loadings are better indicators of the latent variable. Because the loadings are standardized (i.e., all observed and latent variables have a variance of 1), the loadings in this context equal the correlation between the latent variable and each observed variable.1
1 Standardized loadings are actually standardized regression coefficients from latent variables to observed variables. When an observed variable has only one loading and no other influences (other than the error term), the standardized regression coefficient equals the correlation between the latent variable and the observed indicator.
Although the three tests are influenced by a common cause, in Figure 1, each tests has a unique set of influences, e1, e2, e3. The variances of the unique influences are symbolized by curved double-headed arrows. Because they are standardized, they represent the proportion of variance in the observed variables that is not explained by the latent variable.
The model in Figure 1 has 1 latent variable, 3 observed variables, and 3 error/residual terms. Any variable that has an incoming straight arrow from another variable is said to be endogenous, meaning “created from within” the model. That is, endogenous variables are determined entirely be variables within the model. A variable without incoming straight arrows is said to be exogenous, meaning “created from without.”2 That is, exogenous variables are random variables that are determined by influences not specified by the model. In Figure 1, the three observed variables V_1, V_2, V_3 each have two incoming straight arrows, thus they are endogenous. The latent variable \text{Vocabulary} and the three error terms e_1, e_2, e_3 have no incoming straight arrows from other variables, thus they are exogenous.
2 In some latent variable models, means of exogneous variables and intercepts of endogenous variables are symbolized by paths originating from a triangle. Thus, exogenous variables can have incoming arrows from triangles and still be considered exogenous because such arrows simply specify the mean of the variable and do not determine their variability.
McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the Reticular Action Model for moment structures. British Journal of Mathematical and Statistical Psychology, 37(2), 234–251. https://doi.org/10.1111/j.2044-8317.1984.tb00802.x
We can specify latent variable models in a variety of ways, but simplest method, the Reticular Action Model (McArdle & McDonald, 1984), uses two matrices \boldsymbol{A} (Asymmetric) and \boldsymbol{S} (Symmetric). The \boldsymbol{A} matrix contains the asymmetric paths (i.e., single-headed straight arrows). These paths quantify how much latent and observed variables influence each other. In the current model, the only straight arrows are from the latent variable to the observed variables.
Table 2: \boldsymbol{A} Matrix::Asymmetric Paths from Causes to Effect in Figure 1
The \boldsymbol{S} matrix contains the symmetric paths (i.e., double-headed curved arrows). These quantify the variances of the exogenous variables and the covariances between them. In this case, all the covariances among the exogenous variables are zero. The variances were standardized such that the latent variable has a variance of 1 and the variances of the error terms were selected to cause the variances of the observed variances to be exactly 1.
Table 3: \boldsymbol{S} Matrix: Symmetric Paths Among Exogenous Varibles in Figure 1
The two matrices, \boldsymbol{A} and \boldsymbol{S} jointly determine the model-implied covariance matrix \boldsymbol{\Sigma}. If the latent and observed variables are collected in a vector v=\{Vocabulary, v_1, v_2, v_3\}, and the exogenous variables are collected in a vector u=\{Vocabulary, e_1, e_2, e_3\}, then we can specify how the variables in v influence each other like so:
It might feel strange to have the v variable on both sides of the equation. Remember that it is not a single variable, but a vector of variables. The \boldsymbol{A} matrix specifies how the causes in v influence the effects in v. Even so, it helps to isolate v on one side of the equation. We will make use of an identity matrixI in which the diagonal values are ones and the off-diagonal values are zeroes.
An identity matrix is so-named because any matrix multiplied by a compatible identity matrix equals itself.
\boldsymbol{A}\boldsymbol{I}=\boldsymbol{A}
To be compatible when post-multiplied, \boldsymbol{I} would need to have the same size as the number of columns in A.
\boldsymbol{I}\boldsymbol{A}=\boldsymbol{A}
To be compatible when pre-multiplied, \boldsymbol{I} would need to have the same size as the number of rows in A.
We will also use an inverse matrix. A matrix multiplied by its inverse matrix will equal a same-sized identity matrix.
\boldsymbol{A}\boldsymbol{A}^{-1}=\boldsymbol{I}
So, starting with the original equation:
\begin{aligned}
v&=\boldsymbol{A}v +u\\
v-\boldsymbol{A}v&=u\\
\boldsymbol{I}v-\boldsymbol{A}v&=u\\
(\boldsymbol{I}-\boldsymbol{A})v&=u\\
(\boldsymbol{I}-\boldsymbol{A})^{-1}(\boldsymbol{I}-\boldsymbol{A})v&=(\boldsymbol{I}-\boldsymbol{A})^{-1}u\\
\boldsymbol{I}v&=(\boldsymbol{I}-\boldsymbol{A})^{-1}u\\
v&=(\boldsymbol{I}-\boldsymbol{A})^{-1}u\\[3ex]
\begin{bmatrix}
Vocabulary\\ V_1\\ V_2\\ V_3
\end{bmatrix}&=
\begin{bmatrix}
1&0&0&0\\
.7&1&0&0\\
.8&0&1&0\\
.9&0&0&1
\end{bmatrix}\begin{bmatrix}
Vocabulary\\ e_1\\ e_2\\ e_3
\end{bmatrix}
\end{aligned}
So, the variables in v are determined by exogenous variables in u pre-multiplied by the inverse of (\boldsymbol{I}-\boldsymbol{A}). This matrix equation equals this in univariate equations:
If we have the values of matrices \boldsymbol{A} and \boldsymbol{S}, then the model-implied covariance matrix \boldsymbol{\Sigma} can be calculated directly:
# factor loadingsloadings <-c(.7, .8, .9)# AsymmetricA <-matrix(0, nrow =4, ncol =4)A[2:4,1] <- loadings# Symmetric, diag makes a diagonal matrixS <-diag(c(1, 1- loadings ^2))# Identity, diag(4) makes a 4 by 4 identity matrixI <-diag(4)# Inverse matrix, solve inverts matricesiA <-solve(I - A)# Model-implied correlations, %*% is matrix multiplicationSigma <- iA %*% S %*%t(iA)
Most of the time, we do not know the values of \boldsymbol{A} and \boldsymbol{S}. We have instead the means, standard deviations, and correlations of the observed variables. Using specialized syntax to specify a model, the lavaan package can estimate the values of \boldsymbol{A} and \boldsymbol{S} of that model that have a model-implied covariance matrix is as close to the observed covariance matrix as possible.
If the estimated values of \boldsymbol{A} and \boldsymbol{S} can closely duplicate the observed covariances, then the model is not necessarily correct, but is more plausible than a model that cannot closely duplicate the observed covariances. Researchers try to find the simplest theory-based model that “fits” the data. In this context, the model “fits” the data when the model-implied covariance matrix closely resembles the observed covariances.
lavaan Syntax Overview
We will be using the lavaan (LAtent VAriable ANalysis) package to conduct latent growth modeling analyses. To have lavaan evaluate a model, you need to tell it which model you want to test. The lavaan package has a special syntax for specifying models (basic syntax, advanced syntax). The aspects of lavaan syntax relevant to this tutorial are in Table 5.
Symbol
Example
Meaning
=~
A =~ A1 + A2
Latent variable A has observed indicators A1 and A2.
~
Y ~ X1 + X2
Y is predicted by X1 and X2.
Y ~ a * X1 + a * X2
Make X1 and X2 coefficients equal.
Y ~ 1 * X1 + X2
Set the regression coefficient of X1 to 1.
~~
X1 ~~ X2
Estimate the covariance between X1 and X2.
X1 ~~ 0.5 * X2
Set the covariance between X1 and X2 to 0.5.
X1 ~~ 1 * X1
Set X1’s variance to 1.
Table 5: Syntax for lavaan models
In lavaan, latent variable indicators are specified with the =~ operator, which means is indicated by. Consider this lavaan model:
m1 <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3'
This model means there are 3 observed variables A1, A2, and A3 that are indicators of latent variable A and 3 observed variables B1, B2, and B3 that are indicators of latent variable B. The model hypothesizes that latent variable A explains the correlations among A1, A2, and A3 and that latent variable B explains the correlations among B1, B2, and B3. The corresponding path model is in Figure 2. Note that the latent variables A and B are uncorrelated because there is nothing to connect them, which means that A1, A2, and A3 are uncorrelated with B1, B2, and B3.
Code
cA <-class_color("orchid4")@lighten(.7)cB <-class_color("forestgreen")@lighten(.6)cC <-class_color("dodgerblue4")@lighten(.6)cD <-"#b5bdc7"left <-7my_colors <-c(cA, cB, cC)my_observed <-redefault(ob_ellipse, m1 =15, color =NA, fill = cA)my_latent <-redefault(ob_circle, radius =sqrt(pi), color =NA, fill = cA)my_error <-redefault(ob_circle, radius =1/sqrt(pi), color =NA, fill = cA)my_connect <-redefault(connect, color = cA, resect =2)my_observed_label <-redefault(ob_label, color ="white",fill =NA, size =20, vjust = .60)my_latent_label <-redefault(ob_label, color ="white",fill =NA, size =36, vjust = .55)my_error_label <-redefault(ob_label,fill =NA,color ="white",size =16)mAB <-ggdiagram(font_family = my_font, font_size =20) + {A3 <-my_observed() %>%ob_array(k =3,label =my_observed_label(paste0("*A*~", 1:3, "~")),sep = .25)} + {B3 <-my_observed(fill = cB) %>%place(A3[3], sep =0) %>%ob_array(3,anchor ="west",label =my_observed_label(paste0("*B*~", 1:3, "~") ),sep = .25)} + {A <-my_latent(label =my_latent_label("*A*")) %>%place(A3[2], "above", 3)} + {B <-my_latent(fill = cB,label =my_latent_label("*B*") ) %>%place(B3[2], "above", 3)} + {load_A <-my_connect(A, A3)} + {load_B <-my_connect(B, B3, color = cB)} + {vA <-ob_variance(A, color = cA)} + {vB <-ob_variance(B, color = cB)} + {eA3 <-my_error() %>%ob_array(3) %>%place(A3, "below", sep = .75) } + {eB3 <-my_error(fill = cB) %>%place(B3, "below", sep = .75)} +ob_variance(eA3, color = cA, where ="below", theta =60, looseness =2) +ob_variance(eB3, color = cB, where ="below", theta =60, looseness =2) +my_connect(eA3, A3) +my_connect(eB3, B3, color = cB)mAB
Figure 2: Path diagram of latent variables A and B, each of which have three observed indicator variables
To specify that an indicator has a specific (fixed) loading, do so by placing the numeric loading in front of the variable with the * operator in between. Here we set the loadings for A1 and B1 to 1:
Model <-' A =~ 1 * A1 + A2 + A3 B =~ 1 * B1 + B2 + B3'
mAB +ob_label("1", center =midpoint(A, A3[1]), color = cA) +ob_label("1", center =midpoint(B, B3[1]), color = cB)
Figure 3: Fixed Loadings on the first indicators of A and B
To specify a direct path from A to B as in Figure 4, use the ~ operator, which means “is predicted by.”
Model <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3 B ~ A'
Code
mAB +connect(A, B, resect =2, color = cA)
Figure 4: A latent variable model with a direct path between latent variable A and latent variable B
To specify that A and B are related as in Figure 5, use the covariance operator: ~~
Model <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3 B ~~ A'
Code
mAB +ob_covariance(A, B, resect =2, color ="gray60", looseness = .9)
Figure 5: A latent variable model with a covariance between latent variable A and latent variable B
To force the covariance between A and B to be exactly 0.5 as in Figure 6:
Model <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3 B ~~ 0.5 * A'
Code
mAB +ob_covariance(A, B, resect =2, color ="gray60", looseness = .9, label =ob_label(".5", family = my_font, size =16), linewidth = .5)
Figure 6: A latent variable model with a covariance between latent variable A and latent variable B forced to equal 0
The ~~ operator is also used to specify the variance of a variable. For example, to force the variance of B to be 1 as in Figure 7:
Model <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3 B ~~ 1 * B'
Code
mAB +ob_label("1", vB@midpoint(), color = cB)
Figure 7: Latent variable B’s variance is fixed to equal 1.
To specify that A1 and B1 have correlated residuals as in Figure 8, do so like so:
Model <-' A =~ A1 + A2 + A3 B =~ B1 + B2 + B3 A1 ~~ B1'
Code
mAB +ob_covariance(eB3[1], eA3[1], resect =2, color ="gray60", looseness = .9, linewidth = .5)
Figure 8: The residuals of A1 and B1 are allowed to covary.
You can give any loading or covariance a variable name instead of a specific value. To force indicators to have the same loading as in Figure 9, give them the same variable name:
Model <-' A =~ A1 + A2 + A3 B =~ b * B1 + b * B2 + b * B3'
Code
mAB +ob_label("*b*", center = load_B[2]@midpoint()@y %>% load_B@line@point_at_y())
Figure 9: A latent variable model with all indicators of latent variable B forced to have equal loadings with value of b
Vocabulary Study
We are going to apply hierarchical linear modeling and latent growth analysis to the same longitudinal study of vocabulary growth in toddlers.
Participants are toddlers identified as having vocabulary levels lower than expected for their age. Vocabulary levels were measured 6 times: time 0 to time 5. Half the participants were randomly assigned to a vocabulary-enhancing intervention and half were assigned to a condition in which the same amount of time was spent watching Sesame Street. In the intervention variable, control = 0, treatment = 1.
The authors hypothesized that vocabulary would grow quickly at first and slower later on. The intervention was hypothesized to put the children on a faster learning trajectory.
At time 0, before the intervention, the treatment groups were assumed to have the same intercept because of random assignment.
tibble::tribble(~Symbol, ~Meaning,"$b_{00}$", "Intercept (Predicted Vocabulary when Intervention and Time are 0)","$b_{01}$", "Slope for Intervention at Time 0","$b_{10}$", "Linear Slope for Time when Intervention and Time are 0","$b_{11}$", "Interaction of Intervention and Time at Time 0","$b_{20}$", "Quadratic Effect of Time when Intervention is 0","$b_{21}$", "Interaction of Intervention and Quadratic Effect of Time","$b_{0i}$", "Predicted Vocabulary at Time 0 for Person *i*","$b_{1i}$", "Linear Effect of Time at Time 0 for Person *i*","$b_{2i}$", "Quadratic Effect of Time for Person *i*","$u_{0i}$", "Residual for $b_{0i}$","$u_{1i}$", "Residual for $b_{1i}$","$e_{ti}$", "Residual for Person *i* at Time *t*" ) %>% knitr::kable(align ="cl")
Symbol
Meaning
b_{00}
Intercept (Predicted Vocabulary when Intervention and Time are 0)
b_{01}
Slope for Intervention at Time 0
b_{10}
Linear Slope for Time when Intervention and Time are 0
b_{11}
Interaction of Intervention and Time at Time 0
b_{20}
Quadratic Effect of Time when Intervention is 0
b_{21}
Interaction of Intervention and Quadratic Effect of Time
b_{0i}
Predicted Vocabulary at Time 0 for Person i
b_{1i}
Linear Effect of Time at Time 0 for Person i
b_{2i}
Quadratic Effect of Time for Person i
u_{0i}
Residual for b_{0i}
u_{1i}
Residual for b_{1i}
e_{ti}
Residual for Person i at Time t
Table 6: Symbols and Meanings of HLM Model
Analysis
I am going straight to my hypothesized model. In a real analysis, I might have gone through the series of steps we have seen elsewhere in the course. However, I want save time so that I can show how this model is the same as the latent growth model. When I walk through all the latent growth models in sequence, I will show the comparable hierarchical linear models one at a time.
m <-lmer(vocabulary ~1+ time * intervention +I(time ^2) * intervention + (1+ time | person_id), data = d, REML =FALSE)tab_model(m)
vocabulary
Predictors
Estimates
CI
p
(Intercept)
52.00
46.36 – 57.64
<0.001
time
10.50
9.91 – 11.09
<0.001
intervention
-0.09
-7.91 – 7.74
0.983
time^2
-1.06
-1.15 – -0.96
<0.001
time × intervention
9.14
8.33 – 9.95
<0.001
intervention × time^2
0.09
-0.05 – 0.22
0.217
Random Effects
σ2
8.99
τ00person_id
786.64
τ11person_id.time
2.02
ρ01person_id
0.12
ICC
0.99
N person_id
200
Observations
1200
Marginal R2 / Conditional R2
0.381 / 0.993
report(m)
We fitted a linear mixed model (estimated using ML and nloptwrap optimizer) to
predict vocabulary with time and intervention (formula: vocabulary ~ 1 + time *
intervention + I(time^2) * intervention). The model included time as random
effects (formula: ~1 + time | person_id). The model's total explanatory power
is substantial (conditional R2 = 0.99) and the part related to the fixed
effects alone (marginal R2) is of 0.38. The model's intercept, corresponding to
time = 0 and intervention = 0, is at 52.00 (95% CI [46.36, 57.64], t(1190) =
18.08, p < .001). Within this model:
- The effect of time is statistically significant and positive (beta = 10.50,
95% CI [9.91, 11.09], t(1190) = 35.18, p < .001; Std. beta = 0.47, 95% CI
[0.46, 0.48])
- The effect of intervention is statistically non-significant and negative
(beta = -0.09, 95% CI [-7.91, 7.74], t(1190) = -0.02, p = 0.983; Std. beta =
0.32, 95% CI [0.21, 0.43])
- The effect of time^2 is statistically significant and negative (beta = -1.06,
95% CI [-1.15, -0.96], t(1190) = -21.10, p < .001; Std. beta = -0.08, 95% CI
[-0.09, -0.07])
- The effect of time × intervention is statistically significant and positive
(beta = 9.14, 95% CI [8.33, 9.95], t(1190) = 22.08, p < .001; Std. beta = 0.22,
95% CI [0.21, 0.23])
- The effect of intervention × time^2 is statistically non-significant and
positive (beta = 0.09, 95% CI [-0.05, 0.22], t(1190) = 1.24, p = 0.217; Std.
beta = 3.40e-03, 95% CI [-2.00e-03, 8.81e-03])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
Don’t be fooled by the non-significant effect of the intervention! It is a conditional effect at Time 0—before the intervention began.
It is the interaction of time and the intervention that matters.
Let’s plot the model using sjPlot::plot_model:
plot_model(m, type ="pred", terms =c("time [all]", "intervention"))
Interpretation: As hypothesized, the effect of time has diminishing returns, and the time by interaction interaction effect is evident. The two groups were comparable before the intervention, but the treatment group’s vocabulary grew faster over time. If I needed to publish a plot showing these results, it would look like Figure 10.
Sidenote: I love the sjPlot::plot_model function. With almost no effort, I can get a plot that helps me see what is going on with my fixed effects. It used to take me so much more code to arrive at where sjPlot::plot_model lands. However, if you want a publication-worthy plot, sjPlot::plot_model tends to impose too much control. Fortunately, the same person who made sjPlot is involved with another easystats package called modelbased that allows for greater flexibility but still reduces what would otherwise be onerous coding.
See if you can figure out how to restructure the data using pivot_wider. If it takes more than a minute or two, check out the hints.
Call the new data frame d_wide.
Hint
Set the values parameter equal to vocabulary.
values = vocabulary
Hint 2
The names come from time.
names = time
Hint 3
Specify voc_ as the names prefix.
names_prefix = "voc_"
Hint 4
Here is how I would do it:
d_wide <- d %>%pivot_wider(names_from = time, values_from = vocabulary, names_prefix ="voc_")
Model Specification
In general, setting up a latent growth curve model requires much more typing than the corresponding hierarchical linear model. In this model, for all this extra work we get nothing that we did not already have. However, the latent growth modeling approach is extremely flexible, particularly with respect to modeling the simultaneous effects of many other variables. In more complex models, we can evaluate hypotheses that are difficult, if not impossible, in hierarchical linear models.
Note that intercepts are symbolized with the symbol 1 when it is alone or precided by a coefficient (e.g., b_00 ~ 1)
In the hierarchical model, we did not make b_{2i} (the quadratic effect of time) random. Thus, we will set the \tau_{22} variance to 0, and also the covariances involving u_{2i} residual (i.e., \tau_{02} and \tau_{12}).
You can see from the summary output that the parameters are same as before.
As usual, we start with a simple model with random intercepts. Time is not yet included in the model. The latent variable i has a mean of b_{00} and a variance of \tau_{00}. The variances of error terms for each observed variable are constrained to be equal (\tau_1). The intercepts of the observed variables are not modeled here, so they are assumbed to be 0.
m_0 <-"# Intercepti =~ 1 * voc_0 + 1 * voc_1 + 1 * voc_2 + 1 * voc_3 + 1 * voc_4 + 1 * voc_5 # Intercept, slope, and quadratic effectsi ~ b_00 * 1 # Variances and covariancesi ~~ tau_00 * i voc_0 ~~ e * voc_0voc_1 ~~ e * voc_1voc_2 ~~ e * voc_2voc_3 ~~ e * voc_3voc_4 ~~ e * voc_4voc_5 ~~ e * voc_5"fit_0 <-growth(m_0, data = d_wide)# The standard way to get output from lavaansummary(fit_0, fit.measures = T, standardized = T)
lavaan 0.6-19 ended normally after 18 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 8
Number of equality constraints 5
Number of observations 200
Model Test User Model:
Test statistic 3228.267
Degrees of freedom 24
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 3659.200
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.121
Tucker-Lewis Index (TLI) 0.450
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5642.273
Loglikelihood unrestricted model (H1) -4028.140
Akaike (AIC) 11290.547
Bayesian (BIC) 11300.442
Sample-size adjusted Bayesian (SABIC) 11290.938
Root Mean Square Error of Approximation:
RMSEA 0.817
90 Percent confidence interval - lower 0.793
90 Percent confidence interval - upper 0.841
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.357
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
i =~
voc_0 1.000 29.736 0.809
voc_1 1.000 29.736 0.809
voc_2 1.000 29.736 0.809
voc_3 1.000 29.736 0.809
voc_4 1.000 29.736 0.809
voc_5 1.000 29.736 0.809
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
i (b_00) 80.811 2.193 36.844 0.000 2.718 2.718
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
i (t_00) 884.245 96.275 9.185 0.000 1.000 1.000
.voc_0 (e) 467.273 20.897 22.361 0.000 467.273 0.346
.voc_1 (e) 467.273 20.897 22.361 0.000 467.273 0.346
.voc_2 (e) 467.273 20.897 22.361 0.000 467.273 0.346
.voc_3 (e) 467.273 20.897 22.361 0.000 467.273 0.346
.voc_4 (e) 467.273 20.897 22.361 0.000 467.273 0.346
.voc_5 (e) 467.273 20.897 22.361 0.000 467.273 0.346
# The easystats wayparameters(fit_0, component ="all")
# There are many fit indices. https://easystats.github.io/performance/reference/model_performance.lavaan.html# Here are some common indices:my_metrics <-c("CFI", "NNFI", "AGFI", "RMSEA", "AIC", "BIC")performance(fit_0, metrics = my_metrics)
# Indices of model performance
CFI | NNFI | AGFI | RMSEA | AIC | BIC
-----------------------------------------------------
0.121 | 0.450 | 0.426 | 0.817 | 11290.547 | 11300.442
Not surprisingly, the fit statistics of our null model indicate poor fit. The CFI (Comparative Fit Index), NNFI (Non-Normed Fit Index), and AGFI (Adjusted Goodness of Fit) approach 1 when the fit is good. There are no absolute standards for these fit indices, and good values can vary from model to model, but in general look for values better than .90. The RMSEA (Root Mean Squared Error of Approximation) approaches 0 when fit is good. Again, there is no absolute standard for the RMSEA but in general look for values below .10 and be pleased when values are below .05.
The reason that the data fit poorly can be seen clearly in Figure 13. Although vocabulary clearly increases over time, Model 0 constrains the prediction to not grow—to be the same for each person across time.
As seen in Table 7, the various parameters of the analogous hierarchical linear model are nearly identical—if maximum likelihood is used instead of restricted maximum likelihood (i.e., REML = FALSE)
lmer(vocabulary ~1+ (1| person_id), data = d, REML =FALSE) %>%tab_model()
Figure 17: Model 2: Predicted Vocabulary by Intervention Group Over Time (Model 1 + Quadratic Effect of Time)
Again, Table 9 shows that the same results would have been found with the analogous HLM model—and with much less typing! Why are we torturing ourselves like this? There will a payoff to latent growth modeling, just not yet.
I am going to add all three treatment effects at the same time. We could do them one-at-a-time, but no important theoretical question would be answered by doing so. I am mostly interested in the interactions of treatment and the time variables.
m_7 <-"# Intercepti =~ 1 * voc_0 + 1 * voc_1 + 1 * voc_2 + 1 * voc_3 + 1 * voc_4 + 1 * voc_5 # Linear slope of times =~ 0 * voc_0 + 1 * voc_1 + 2 * voc_2 + 3 * voc_3 + 4 * voc_4 + 5 * voc_5# Acceleration effect of time (quadratic effect)q =~ 0 * voc_0 + 1 * voc_1 + 4 * voc_2 + 9 * voc_3 + 16 * voc_4 + 25 * voc_5# Intercept, slope, and quadratic effectsi ~ b_00 * 1 + b_01 * interventions ~ b_10 * 1 + b_11 * interventionq ~ b_20 * 1 + b_21 * intervention# Variances and covariancesi ~~ tau_00 * i s ~~ tau_11 * s + tau_01 * iq ~~ tau_22 * q + tau_12 * s + tau_02 * ivoc_0 ~~ e * voc_0voc_1 ~~ e * voc_1voc_2 ~~ e * voc_2voc_3 ~~ e * voc_3voc_4 ~~ e * voc_4voc_5 ~~ e * voc_5"fit_7 <-growth(m_7, data = d_wide)parameters(fit_7, component ="all")
# Comparison of Model Performance Indices
Name | Model | CFI | NNFI | AGFI | RMSEA
----------------------------------------------
fit_6 | lavaan | 1.000 | 1.001 | 0.993 | 0.000
fit_7 | lavaan | 1.000 | 1.001 | 0.999 | 0.000
So the intervention effects improve the model fit. Specifically, the intervention predicts the slope such that the slope is steeper for the treatment group.
Figure 27: Model 7: Predicted Vocabulary by Intervention Group Over Time (Model 6 + Effects of Intervention on Intercept, Linear Effect of Time, and Quadratic Effect of Time)
lmer(vocabulary ~1+ time*intervention +I(time ^2) * intervention + (1+ time +I(time ^2) | person_id), data = d, REML =FALSE) %>%tab_model()
# Likelihood-Ratio-Test (LRT) for Model Comparison
Model | Type | df | df_diff | Chi2 | p
---------------------------------------------
fit_7 | lavaan | 20 | 1 | 14.44 | 0.920
fit_8 | lavaan | 19 | | 14.43 |
The autoregression parameter was not statistically significant. Comparing the two models revealed no difference in model fit. Thus, there appears to be no need of the autoregression parameter, once we have accounted for the latent growth.
Exercise
After a series of painful incidents in which several middle-schoolers were severely harassed and bullied by peers, the school principal decided to implement a school-wide social-emotional learning intervention that aims to give students verbal self-defense skills, encourage bystanders to become “upstanders” that defend bullied peers, and to create an overall sense of solidarity and belonging at the school. Each student’s sense of school belonging was measured before the intervention and for each of the 8 weeks of the intervention. Your task is to how school belonging increases in general and specifically for students who self-identify as having been bullied.
Figure 29: Trajectory of change in self-rated sense of school belonging for bullied and non-bullied student during 8 weeks of an anti-bullying intervention.
It looks like both groups felt increasing levels of belonging, but that their trajectories of change differed. Let’s test this idea formally.
Using latent growth models, conduct the following tests:
Create a plain random intercept model. Call the intercept latent variable i, its intercept parameter b_00, and its variance tau_00. Set the observed variable’s variances equal using the parameter name tau_1.
Evaluate whether adding a fixed linear effect of time is significant. Call the slope latent variable s, its intercept parameter b_10. Set s’s variance to 0, as well as its covariance with i.
Evaluate whether adding a fixed quadratic effect of time q is significant. Set q’s intercept to parameter name to b_20. Set q’s variance to 0 as well as its covariances with i and s.
Evaluate whether adding a random linear effect of time is significant. Set s’s variance to parameter name to tau_11.
Hint
Replace the 0 in with tau_11 in your model like so:
s ~~ tau_11 * s + 0 * i
Evaluate whether adding a random quadratic effect of time is significant. Set q’s variance to parameter name to tau_22.
Hint
Replace the 0 in with tau_22 in your model like so:
q ~~ tau_22 * q + 0 * i + 0 * s
Evaluate whether the random effects should correlate. Set the covariance between i and s to tau_01, the covariance between i and q to tau_02 and the covariance between s and q to tau_12.
Hint
The variances and covariances should look like this:
# Variances and covariances
i ~~ tau_00 * i
s ~~ tau_11 * s + tau_01 * i
q ~~ tau_22 * q + tau_12 * s + tau_02 * i
Evaluate whether bullied has effects on i, s, and q. Call the regression parameters b_01 for i, b_11 for s, and b_21 for q. Is the trajectory of change different for students who identify as having been bullied?