4  Introduction to R & RStudio

This part gives a short introduction to R and RStudio. If you are familiar with the programs and are not interested in the R vs tidyverse “distinction”, you can skip this section.

4.1 What is R & RStudio?

  • R: R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS (see https://www.r-project.org/).

  • R Studio: Coding environment for R, built by Posit.

Some advertisement from the Posit website:

Used by millions of people weekly, the RStudio integrated development environment (IDE) is a set of tools built to help you be more productive with R and Python. It includes a console, syntax-highlighting editor that supports direct code execution. It also features tools for plotting, viewing history, debugging and managing your workspace.

Of course there are other IDEs (e.g., Visual Studio Code, but if you use R, RStudio is most likely the way to go.

4.2 How to work with R and RStudio?

This should look like this, maybe or probably with a different appearance (this is the Dracula theme). You can change this via Tools > Global Options > Appearance

In RStudio there are different panes1:

4.2.1 Panes

  • Console
    • Here you can access R
    • E.g., ask R what is: 2 + 2
  • Source/Script
    • Editor to save scripts
Warning

You should never work directly in the Console, but always use a R-script (e.g., script.R) or even better a Quarto document (e.g., script.qmd). It is important to understand and reproduce everything you did.

  • Environment/History/…/Tutorial
    • Environment: contains all objects that were created or loaded during an R session
    • History:
    • … e.g., the free and open source distributed version control system git
    • Tutorial: A tutorial to learn R with the learnr package (Aden-Buie et al., 2023)
  • Files/Plots/Packages/Help/Viewer
    • Files: is kind of the file manager
    • Plots: shows the generated plots
    • Packages: overview of the (loaded &) installed packages
    • Help: When you ask for help (e.g., regarding a specific function in R: ?mean)
    • Viewer: E.g., previewing rendered Quarto documents

4.2.2 Projects

It is also reasonable to use the project option. This means, whenever you start a new project (e.g., a scale-manual), create an project: File > New Project

  1. Choose between:
  • New Directory (for today)

  • Existing Directory

  • Version Control (this is recommended, but is beyond the scope of this workshop)

  1. Choose a project type (today a R project or Quarto project)

  1. Provide a short name, set the check mark Open in new session and click Create Project

4.3 Short introduction to the R programming language

This section gives a (very?) brief introduction to the R programming language.

…or overviews of the language see (e.g.):

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

– John Chambers (creator of the S programming language)

4.3.1 Basics

Before working with R, there are a few basics you need to know:

  • R is a case-sensitive programming language. This means that R distinguishes whether a word is written in upper or lower case
"name" == "Name"
[1] FALSE
  • Values are assigned to objects using <-
a <- "Hello world!"
  • Arguments within functions are assigned using =
df <- data.frame(
  x = 1:4,
  y = 3:6
)

File > New File > R Script or alternatively use the shortcut Ctrl + Shift + N

Then save the file File > Save or File > Save As. Shortcut: Ctrl + s

4.3.2 Data Types

The basic data types2 in R are depicted in Table 4.1.

Table 4.1: Basic data types in R
Type Description Value (example)
Numeric Numbers with decimal value or fraction 3.7
Integer Counting numbers and their additive inverses 2, -115
Character aka string. Letters enclosed by quotes in the output. "Hello World!","4"
Logical boolean TRUE, FALSE
Factor Categorial data
- Level: characteristic value as seen by R
- Label: designation of the characteristic attributes

0, 1
male,female
Special
  • Missing values: unknown cell value
  • Impossible values: not a number
  • Empty values: known empty cell value
NA
NaN
NULL
x <- 10
class(x)
[1] "numeric"
y <- "Hello World"
class(y)
[1] "character"

4.3.3 Data Structures

R has a couple of different data structures3 which are briefly described in the following subsections.

4.3.3.1 Vector

  • one-dimensional array
  • same data type
  • e.g., c(45, 6, -83, 23, 61)

Create a vector with the c function

v <- c(45, 6, -83, 23, 61)
v
[1]  45   6 -83  23  61

Or a named vector…

vNam <- c(a = 45, b = 6, c = -83, d = 23, e = 61)
vNam
  a   b   c   d   e 
 45   6 -83  23  61 

Count the amount of items contained in vector

[1] 5

Vector indexing (by position)

v[1]
[1] 45
v[-3]
[1] 45  6 23 61

Slicing vectors

v[3:5]
[1] -83  23  61

Generate regular sequences using seq function

seq(from = 0,
    to = 20,
    by = 2)
 [1]  0  2  4  6  8 10 12 14 16 18 20

4.3.3.2 Matrix

  • two-dimensional
  • same data type
  • example see on the right

The matrix function creates a matrix from the given set of values

m <- matrix(data = c(1, 2, 3, 45, 36, 52),
            nrow = 2,
            ncol = 3,
            byrow = TRUE)
m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]   45   36   52

Slicing works also on matrices: m[row , column]

m[, 1:2]
     [,1] [,2]
[1,]    1    2
[2,]   45   36

4.3.3.3 List

  • can contain elements of various data types
  • often ordered collection of values
  • one-indexed (indexing starts with 1)
  • e.g., list("hi", 2, NULL)

Create lists (with different elements, i.e., numbers and letters) with the list function

l1 <- list(1:5)
l2 <- list(letters[1:5])
l3 <- list(LETTERS[1:5])

Create a nested list…

l4 <- list(l1, l2, l3)

…or a named (nested) list

l4Nam <- list("Numbers" = l1,
              "SmallLetters" = l2,
              "CaptialLetters" = l3)

Access list or nested list elements

l4[2]
[[1]]
[[1]][[1]]
[1] "a" "b" "c" "d" "e"
l4[[2]][3]
[[1]]
NULL

Unlist the list to get vector which contains all the atomic components

unlist(l1)
[1] 1 2 3 4 5
unlist(l4)
 [1] "1" "2" "3" "4" "5" "a" "b" "c" "d" "e" "A" "B" "C" "D" "E"

Count amount of items contained in list

length(l4)
[1] 3
[1] 15

4.3.3.4 Data frame

  • various columns
  • different data types
  • variables = columns
  • observations = rows
  • example see on the right
df <- data.frame(
  id = 1:4,
  age = c(12, 13, 12, 14),
  sex = c(1, 1, 2, 2)
)
df
  id age sex
1  1  12   1
2  2  13   1
3  3  12   2
4  4  14   2

Number of observations

nrow(df)
[1] 4

Show dimension (rows, columns) of dataframe

dim(df)
[1] 4 3

Column names

[1] "id"  "age" "sex"

Show the first two rows of the dataframe

head(df, 2)
  id age sex
1  1  12   1
2  2  13   1

Structure of dataframe object

str(df)
'data.frame':   4 obs. of  3 variables:
 $ id : int  1 2 3 4
 $ age: num  12 13 12 14
 $ sex: num  1 1 2 2

Some descriptive statistics using the summary function (for more see Section Descriptive statistics and item analysis

       id            age             sex     
 Min.   :1.00   Min.   :12.00   Min.   :1.0  
 1st Qu.:1.75   1st Qu.:12.00   1st Qu.:1.0  
 Median :2.50   Median :12.50   Median :1.5  
 Mean   :2.50   Mean   :12.75   Mean   :1.5  
 3rd Qu.:3.25   3rd Qu.:13.25   3rd Qu.:2.0  
 Max.   :4.00   Max.   :14.00   Max.   :2.0  

4.4 Base R vs. & tidyverse

Besides the functionality of base R (R Core Team, 2023), there is the so-called tidyverse (Wickham, 2023) within R. The tidyverse is a collection of R packages (see Figure 4.1) that “share an underlying design philosophy, grammar, and data structures” and are (specifically) designed for data science (see https://www.tidyverse.org/).

Figure 4.1: tidyverse package collection

Within the tidyverse package collection, the dplyr package (Wickham et al., 2023) provides a set of convenient functions for manipulating data. Together with the pipe operator %>% from the magrittr package (Bache & Wickham, 2022)), it is an extremely powerful approach to manipulate data in a clear and comprehensible way. The native4 R pipe |> was introduced with R v4.1.0.

The tidyverse style guide suggests using the pipe operator “to emphasize a sequence of actions”. The pipe operator can be understood as “take the object and then” pass it to the next function. In the following, the use of the base R pipe operator is shown:

  1. Take the data frame exDat and then
  2. Select the variables: msc1 and msc2 and then
  3. Calculate descriptive statistics using the describe function from the psych package (Revelle, 2023) and then
  4. Create a table with the kable function from the knitr package (Xie, 2023)
exDat |> 
  dplyr::select(c(msc1, msc2)) |>
  psych::describe(fast=TRUE) |> 
  knitr::kable(digits = 2) 
vars n mean sd min max range se
msc1 1 750 2.52 0.74 1 4 3 0.03
msc2 2 680 2.54 0.72 1 4 3 0.03

In contrast, when we use a nested approach the code would look like this:

knitr::kable(psych::describe(exDat,fast=TRUE),digits = 2) 

…or maybe a little bit more clear:

knitr::kable(
  psych::describe(exDat,
                  fast=TRUE),
  digits = 2) 

Nevertheless, when there are many functions, it gets kind of messy and difficult to comprehend. For more information on how to use pipes, see Chapter 4 of the guide.


  1. You can customize them: Tools > Global Options > Pane Layout↩︎

  2. We omitted the complex type.↩︎

  3. We omitted arrays.↩︎

  4. for the difference between |> and %>% see https://ivelasq.rbind.io/blog/understanding-the-r-pipe/↩︎