At the heart of the lavaan package is the ‘model syntax’. The model syntax is a description of the model to be estimated. In this section, we briefly explain the elements of the lavaan model syntax. More details are given in the examples that follow.
In the R environment, a regression formula has the following form:
y ~ x1 + x2 + x3 + x4
In this formula, the tilde (“
~”) is the regression operator. On
the left-hand side of the operator, we have the dependent variable (
and on the right-hand side, we have the independent variables, separated
by the “
+” operator. In lavaan, a typical model is simply a set (or system)
of regression formulas, where some variables (starting with an ‘
below) may be latent. For example:
y ~ f1 + f2 + x1 + x2 f1 ~ f2 + f3 f2 ~ f3 + x1 + x2
If we have latent variables in any of the regression formulas, we must ‘define’
them by listing their (manifest or latent) indicators. We do this by using the
special operator “
=~”, which can be read as is measured by. For example,
to define the three latent variabels
f3, we can use something
f1 =~ y1 + y2 + y3 f2 =~ y4 + y5 + y6 f3 =~ y7 + y8 + y9 + y10
Furthermore, variances and covariances are specified using a ‘double tilde’ operator, for example:
y1 ~~ y1 # variance y1 ~~ y2 # covariance f1 ~~ f2 # covariance
And finally, intercepts for observed and latent variables are simple
regression formulas with only an intercept (explicitly denoted by the
1’) as the only predictor:
y1 ~ 1 f1 ~ 1
Using these four formula types, a large variety of latent variable models can be described. The current set of formula types is summarized in the table below.
|latent variable definition||
||is measured by|
||is regressed on|
||is correlated with|
A complete lavaan model syntax is simply a combination of these formula types, enclosed between single quotes. For example:
myModel <- ' # regressions y1 + y2 ~ f1 + f2 + x1 + x2 f1 ~ f2 + f3 f2 ~ f3 + x1 + x2 # latent variable definitions f1 =~ y1 + y2 + y3 f2 =~ y4 + y5 + y6 f3 =~ y7 + y8 + y9 + y10 # variances and covariances y1 ~~ y1 y1 ~~ y2 f1 ~~ f2 # intercepts y1 ~ 1 f1 ~ 1 '
There reason why you should use single quotes is that this is the only
way (in R) to allow for double quotes inside a string. See
in R for more information.
You can type this syntax interactively at the R prompt, but it is much more convenient to type the whole model syntax first in an external text editor. And when you are done, you can copy/paste it to the R console. If you are using RStudio, open a new ‘R script’, and type your model syntax (and all other R commands needed for this session) in the source editor of RStudio. And save your script, so you can reuse it later on.
The code piece above will produce a model syntax object, called
can be used later when calling a function that actually estimates this model
given a dataset. Note that formulas can be split over multiple lines, and you
can use comments (starting with the
# character) and blank lines within the
single quotes to improve the readability of the model syntax.
You may split your model syntax is multiple parts. For example:
part1 <- ' # latent variable definitions f1 =~ y1 + y2 + y3 f2 =~ y4 + y5 + y6 f3 =~ y7 + y8 + y9 + y10 ' part2 <- ' # fix covariance between f1 and f2 to zero f1 ~~ 0*f2 '
When fitting the model, you may then simply concatenate the multiple parts together as follows:
fit <- cfa(model = c(part1, part2), data = myData)