This Quarto book is designed to provide an introduction to data documentation with R and Quarto and serves as the accompanying script for the workshop. For an overview about the workshop agenda see the Introduction section.
Note
The material is work in progress. It is the first time that the workshop will be held in this format. If you have feedback or encountered any bugs, please send us an email. The book was last updated on April 12, 2023.
Please prepare yourself by following the steps below:
Allaire, J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2023). Rmarkdown: Dynamic documents for r. https://CRAN.R-project.org/package=rmarkdown
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., & Dunnington, D. (2023). ggplot2: Create elegant data visualisations using the grammar of graphics. https://CRAN.R-project.org/package=ggplot2
---published-title: "Workshop held on:"---# PreperationThis [Quarto book](https://quarto.org/docs/books/) is designed to provide an introduction to data documentation with `R` and `Quarto` and serves as the accompanying script for the workshop. For an overview about the workshop agenda see the [Introduction](#intro.qmd) section. ```{r}#| label: date-last-render#| echo: false#| results: hideSys.setlocale("LC_TIME", "English")dateLastRender <-format( Sys.time(), "%B %d, %Y")```::: {.callout-note}The material is work in progress. It is the first time that the workshop will be held in this format. If you have feedback or encountered any bugs, please send us an [email](mailto:sven.rieger@uni-tuebingen.de). The book was last updated on `r dateLastRender`.:::Please prepare yourself by following the steps below:- [Software installation](#software-install)- [Package installation](#pkg-install)- [Data set](#gen-dat)If you encounter any problems, please send us an [email](mailto:sven.rieger@uni-tuebingen.de).## Software installation {#software-install}Please install the following software and make sure that you are download the latest (released!) version of each program:```{r}#| label: versions#| echo: false#RStudio.Version()$versionrversion <-paste0("v",R.version$major, ".", R.version$minor) rstudversion <-paste0("v","2023.3.0.386")```- `R`[`r rversion`, @R-base]: [https://cran.r-project.org/bin/windows/base/](https://cran.r-project.org/bin/windows/base/){target="_blank"}- `RStudio`[`r rstudversion`, @RStudio2023]: [https://posit.co/downloads/](https://posit.co/downloads/){target="_blank"}- `Quarto`[v1.3, @R-quarto]: [https://quarto.org/docs/get-started/](https://quarto.org/docs/get-started/){target="_blank"} ## Package installation {#pkg-install}> R is an integrated suite of software facilities for data manipulation, calculation and graphical display (see [https://www.r-project.org](https://www.r-project.org/about.html){target="_blank"}).`R` is--among other things--great, because there is a large collection of packages. During the workshop, we will use the following `R` packages:```{r}#| label: pkgs#| code-fold: show#| echo: truepkgList <-c("rmarkdown","knitr", # tables"kableExtra", # tables"tibble", # data frame"data.table", # rbindlist function"haven", # read data"lavaan", # generate data"tidyr", # reshape tidyverse"dplyr", # prepare data"moments", # skewness/kurtosis"car", # recoding"stringr", # strings"psych", # descriptive statistics"ggplot2",# plots"scales") # percent ``````{r}#| label: write-r-refs#| eval: true#| echo: falseotherPkgs <-c("base", "learnr", "quarto","mice","foreign","magrittr","tidyverse", "stringi","lattice") # plotsknitr::write_bib(x =c(otherPkgs, pkgList),file ="r-refs.bib")``````{r}#| label: write-pkgs#| echo: false#| results: asisfor (i in1:length(pkgList)) {cat(paste0(i, ". ", pkgList[i]," [", "v", utils::packageVersion(pkgList[i]),", @R-", pkgList[i],"]\n"))}```You can install them (check the versions!) with the following code: ```{r}#| label: pkgs-2#| code-fold: show#| eval: falselapply(pkgList,function(x) if(!x %in%rownames(installed.packages())) install.packages(x))```::: {.callout-note collapse="true" apperance="simple" title="Information About the R Session"}```{r}#| label: sess-infosessionInfo()```Note that we often did not load the packages, but use the function via `::` (e.g., `psych::describe()`).:::## Data set {#gen-dat}Finally, we will use an (simulated) example data set. To get it, execute the following code:```{r}#| label: sim-data-1#| code-fold: showPopMod <-"eta1 =~ .8*msc1 + .8*msc2 + -.8*msc3 + -.8*msc4eta1 ~~ 1*eta1eta1 ~ 0*1msc3 ~~ .2*msc4msc1 | -1.5*t1 + 0*t2 + 1.5*t3msc2 | -1.5*t1 + 0*t2 + 1.5*t3msc3 | 1.5*t1 + 0*t2 + -1.5*t3msc4 | 1.5*t1 + 0*t2 + -1.5*t3age ~ 10*1age ~~ 2.5*agesex | 0*t1sex ~*~ .5*sexeta1 ~~ age + sex"exDat <- lavaan::simulateData(model = PopMod,sample.nobs =seq(50,250, by =50),seed =999)```Some cosmetics, and "adding" missing data.```{r}#| label: sim-data-2#| code-fold: showexDat$sex <- exDat$sex-1exDat$edu <- exDat$group-1exDat$group <-NULLpropMiss1 <- .05propMiss2 <- .1exDat$sex <-ifelse (rbinom(nrow(exDat),size =1, propMiss1) ==1,NA, exDat$sex )exDat$age <-ifelse (rbinom(nrow(exDat),size =1, propMiss2) ==1,NA, exDat$age )exDat$msc2 <-ifelse (rbinom(nrow(exDat),size =1, propMiss2) ==1,NA, exDat$msc2 )```Add a character.```{r}#| label: sim-data-3exDat$fLang <-rep(c("german", "ger", "germn","italian","french",NA," ",""),c(650, 49, 1, 10, 10, 20, 5, 5))```Add outlier for the variable `age.````{r}#| label: sim-data-5exDat[600, "age"] <-30```Add `id` variable.```{r}#| label: sim-data-6exDat$id <-1:nrow(exDat)```::: {.callout-note collapse="true"}## Some descriptive statistics```{r}#| label: descr-gendat-tab#| eval: true#| echo: false#| tbl-cap: Descriptive statistics of the generated dataknitr::kable(x = psych::describe(exDat[,-ncol(exDat)]),digits =2) |> kableExtra::kable_classic("hover")``````{r}#| label: cor-gendat-tab#| eval: true#| echo: false#| tbl-cap: Correlation table of the generated datacorTab <-format(round(cor(exDat[,!names(exDat) %in%c("fLang", "id")],use ="pairwise.complete.obs"),2),nsmall =2)corTab[upper.tri(corTab)] <-""knitr::kable(x = corTab,align ="c") |> kableExtra::kable_classic("hover")```:::```{r}#| label: save-simDat#| echo: false#| eval: false#|saveRDS(exDat, "exampleDat.RDS")```