This Quarto book is designed to provide an introduction to data documentation with R and Quarto and serves as the accompanying script for the workshop. For an overview about the workshop agenda see the Introduction section.
The material is work in progress. It is the first time that the workshop will be held in this format. If you have feedback or encountered any bugs, please send us an email. The book was last updated on April 12, 2023.
Please prepare yourself by following the steps below:
```{r}
#| label: pkgs
#| code-fold: show
#| echo: true
pkgList <- c(
  "rmarkdown",
  "knitr",       # tables
  "kableExtra",  # tables
  "tibble",      # data frame
  "data.table",  # rbindlist function
  "haven",       # read data
  "lavaan",      # generate data
  "tidyr",       # reshape tidyverse
  "dplyr",       # prepare data
  "moments",     # skewness/kurtosis
  "car",         # recoding
  "stringr",     # strings
  "psych",       # descriptive statistics
  "ggplot2",     # plots
  "scales")      # percent 
```

You can install them (check the versions!) with the following code: 

```{r}
#| label: pkgs-2
#| code-fold: show
#| eval: false
lapply(pkgList, function(x) if(!x %in% rownames(installed.packages())) install.packages(x))
```

## Data set {#gen-dat}

Finally, we will use an (simulated) example data set. To get it, execute the following code:

```{r}
#| label: sim-data-1
#| code-fold: show
PopMod <- "
eta1 =~ .8*msc1 + .8*msc2 + -.8*msc3 + -.8*msc4
eta1 ~~ 1*eta1
eta1 ~ 0*1
msc3 ~~ .2*msc4
msc1 | -1.5*t1 + 0*t2 + 1.5*t3
msc2 | -1.5*t1 + 0*t2 + 1.5*t3
msc3 | 1.5*t1 + 0*t2 + -1.5*t3
msc4 | 1.5*t1 + 0*t2 + -1.5*t3
age ~ 10*1
age ~~ 2.5*age
sex | 0*t1
sex ~*~ .5*sex
eta1 ~~ age + sex
"

exDat <- lavaan::simulateData(model = PopMod,
                               sample.nobs = seq(50,250, by = 50),
                               seed = 999)
```

Some cosmetics, and "adding" missing data.

```{r}
#| label: sim-data-2
#| code-fold: show
exDat$sex <- exDat$sex-1
exDat$edu <- exDat$group-1
exDat$group <- NULL

propMiss1 <- .05
propMiss2 <- .1

exDat$sex <- ifelse (rbinom(nrow(exDat),size = 1, propMiss1) == 1,
                     NA, exDat$sex )
exDat$age <- ifelse (rbinom(nrow(exDat),size = 1, propMiss2) == 1,
                     NA, exDat$age )
exDat$msc2 <- ifelse (rbinom(nrow(exDat),size = 1, propMiss2) == 1,
                      NA, exDat$msc2 )
```

Add a character.

```{r}
#| label: sim-data-3
exDat$fLang <- rep(c("german", "ger", "germn",
                     "italian",
                     "french",
                     NA,
                     " ",
                     ""),
                   c(650, 49, 1, 10, 10, 20, 5, 5))
```

Add outlier for the variable `age`.

```{r}
#| label: sim-data-5
exDat[600, "age"] <- 30
```

Add `id` variable.

```{r}
#| label: sim-data-6
exDat$id <- 1:nrow(exDat)
```