At the heart of the lavaan package is the ‘model syntax’. The model syntax is a description of the model to be estimated. In this section, we briefly explain the elements of the lavaan model syntax. More details are given in the examples that follow.

In the R environment, a regression formula has the following form:

```
y ~ x1 + x2 + x3 + x4
```

In this formula, the tilde (“`~`

”) is the regression operator. On
the left-hand side of the operator, we have the dependent variable (`y`

),
and on the right-hand side, we have the independent variables, separated
by the “`+`

” operator. In lavaan, a typical model is simply a set (or system)
of regression formulas, where some variables (starting with an ‘`f`

’
below) may be latent. For example:

```
y ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2
```

If we have latent variables in any of the regression formulas, we must ‘define’
them by listing their (manifest or latent) indicators. We do this by using the
special operator “`=~`

”, which can be read as *is measured by*. For example,
to define the three latent variabels `f1`

, `f2`

and `f3`

, we can use something
like:

```
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10
```

Furthermore, variances and covariances are specified using a ‘double tilde’ operator, for example:

```
y1 ~~ y1 # variance
y1 ~~ y2 # covariance
f1 ~~ f2 # covariance
```

And finally, intercepts for observed and latent variables are simple
regression formulas with only an intercept (explicitly denoted by the
number ‘`1`

’) as the only predictor:

```
y1 ~ 1
f1 ~ 1
```

Using these four *formula types*, a large variety of latent variable models can
be described. The current set of formula types is summarized in the table
below.

formula type | operator | mnemonic |
---|---|---|

latent variable definition | `=~` |
is measured by |

regression | `~` |
is regressed on |

(residual) (co)variance | `~~` |
is correlated with |

intercept | `~ 1` |
intercept |

A complete lavaan model syntax is simply a combination of these formula types,
enclosed between *single* quotes. For example:

```
myModel <- ' # regressions
y1 + y2 ~ f1 + f2 + x1 + x2
f1 ~ f2 + f3
f2 ~ f3 + x1 + x2
# latent variable definitions
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10
# variances and covariances
y1 ~~ y1
y1 ~~ y2
f1 ~~ f2
# intercepts
y1 ~ 1
f1 ~ 1
'
```

There reason why you should use single quotes is that this is the only
way (in R) to allow for double quotes inside a string. See `?Quotes`

in R for more information.

You can type this syntax interactively at the R prompt, but it is much more convenient to type the whole model syntax first in an external text editor. And when you are done, you can copy/paste it to the R console. If you are using RStudio, open a new ‘R script’, and type your model syntax (and all other R commands needed for this session) in the source editor of RStudio. And save your script, so you can reuse it later on.

The code piece above will produce a model syntax object, called `myModel`

that
can be used later when calling a function that actually estimates this model
given a dataset. Note that formulas can be split over multiple lines, and you
can use comments (starting with the `#`

character) and blank lines within the
single quotes to improve the readability of the model syntax.

You may split your model syntax is multiple parts. For example:

```
part1 <- ' # latent variable definitions
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 + y9 + y10
'
part2 <- ' # fix covariance between f1 and f2 to zero
f1 ~~ 0*f2
'
```

When fitting the model, you may then simply concatenate the multiple parts together as follows:

```
fit <- cfa(model = c(part1, part2), data = myData)
```