8  Graphical Visualization (with ggplot2)

Last updateded on

March 4, 2026

Abstract
This chapter provides an introduction to graphical visualization using the ggplot2 package. It provides an overview of different types of plots and how to create them. It is also available as a Revealjs presentation.

Revealjs Presentation

If you want to see the presentation in full screen click here.

8.1 Purpose of Data Visualizing

The purpose of data visualizing is versatile as it…

  • facilitates understanding of data (i.e., makes data “accessible”)

  • reveals “hidden” structures, trends and patterns

  • identifies outliers

  • helps to communicate results clearly (storytelling)

8.2 (Some) Example Diagrams

But there are more such as Dot plots, venn diagrams, maps, networks, & many more…!

8.3 The ggplot2 package

First of all, there are other graphical R packages such as:

The ggplot2 package (Wickham et al., 2026)

  • is based on the Grammar of Graphics (Wilkinson & Wills, 2005) → graphs can be composed by independent components

  • has great flexibility (and add-on packages) → nothing is impossible

  • has a broad community support (e.g., https://stackoverflow.com/), and AI tools help to generate ggplot2 code

  • belongs to the tidyverse (Wickham, 2023) package collection

see Wickham (2011)

8.3.1 Key Components

Every ggplot needs 3 components to produce a plot:

1ggplot(data = diamonds,
2       aes(x = price, y = carat)) +
3    geom_point()
1
the data,
2
the so-called aesthetic mappings (i.e., which variables from the dataset should be used and how should they be mapped in the plot), and
3
at least one layer that defines what kind of visualization is desired (e.g., geom_point for a scatter plot)
Note

Whereas the data and the aesthetic mappings (aes()) are stated within the ggplot() function, the geom_point layer is added with a +.

8.3.2 Getting Started with ggplot2

First, install and load the package.

#install.packages("ggplot2")
library(ggplot2)

Setting a theme (optional). This can be done by using theme_set() function. There are some built-in themes such as theme_minimal(), theme_bw(), or theme_classic().

theme_set(theme_classic())

Further, you can customize the theme by using theme_update() function. For example, you can change the text size, font family, and color.

# install.packages("extrafont") # this may take a while
extrafont::loadfonts() 
theme_update(text = element_text(
    size=25,
    family="Helvetica",
    color = "blue"))
#extrafont::loadfonts()
myTheme <- theme_set(theme_classic()) +
  theme(
    text = element_text(
        size = 25, 
        family = "Comic Sans MS",
        color = "pink"),
    panel.grid.major.y = element_line(
      color = "red",
      linewidth = 0.5,
      linetype = 2
    )
  )

8.4 Example Data

For this exercise, we use (again) a subset of the HSB dataset which is provided in the merTools package (Knowles & Frederick, 2025):

For more details on the dataset, see Chapter High School and Beyond.

dat <- merTools::hsb
head(dat[,1:7])
  schid minority female    ses mathach size schtype
1  1224        0      1 -1.528   5.876  842       0
2  1224        0      1 -0.588  19.708  842       0
3  1224        0      0 -0.528  20.349  842       0
4  1224        0      0 -0.668   8.781  842       0
5  1224        0      0 -0.158  17.898  842       0
6  1224        0      0  0.022   4.583  842       0

8.5 Mapping Plot Types to geom_* Layers

To create different types of plots, you need to use different geom_* layers. Here are some examples:

  • bar chart → geom_bar() (jump)
  • histogram → geom_histogram() (jump)
  • boxplot → geom_boxplot() (jump)
  • line chart → geom_line()
  • scatter plot → geom_point() (jump)
  • text labels → geom_text()

8.6 Bar Chart

8.6.1 geom_bar layer (basic)

0p_bp <- ggplot(data = dat,
                aes(x = factor(female))) +
1            geom_bar()
0
Provide data and aesthetics mappings (see Key Components)
1
Add the geom_bar() layer
Code
print(p_bp)

8.6.2 geom_bar layer (customized)

p_bp_c <- ggplot(data = dat,
                aes(x = factor(female))) +
1            geom_bar(color = "blue",
2                     linewidth = 1.5,
3                     fill = "pink") +
4            scale_x_discrete(
               label = c("0" = "Male",
                         "1" = "Female")) +
5            labs(y = "Count", x = "Gender")
1
color argument: Customize color of boarder
2
linewidth argument: Customize size of boarder
3
fill argument: Customize color of bars
4
label argument in scale_x_discrete function can be used to provide labels for categories
5
labs function: Customize text on x- and y-axes
Code
print(p_bp_c)

8.7 Histogram

8.7.1 geom_histogram layer (basic)

0p_hist <- ggplot(data = dat,
                 aes(x = mathach)) +
1            geom_histogram()
0
Provide data and aesthetics mappings (see key components slide)
1
Add the geom_histogram()
Code
print(p_hist)
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

8.7.2 geom_histogram layer (customized)

p_hist_c <- ggplot(data = dat,  
                   aes(x = mathach)) + 
                geom_histogram(  
1                    color = "black",
2                    fill = "white",
3                    bins = 30,
4                    binwidth = 1.5
                    ) + 
5                labs(y = "Count",
                     x = "Math Achievement",
                     title = "My nice histogram")
1
color: argument for color of the boarders of the bars
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!)
3
bins argument: Number of bins
4
binwidth argument: The width of the bins
5
labs layer: Provide names of axis and title
Code
print(p_hist_c)

8.8 Scatterplot

8.8.1 geom_point layer (basic)

0p_scp <- ggplot(data = dat,
                aes(y = mathach, x = ses)) +
1            geom_point()
0
Provide data and aesthetics mappings (see key components slide)
1
Add the geom_point()
Code
print(p_scp)

8.8.2 geom_point layer (customized)

p_scp_c <- ggplot(data = dat,  
                  aes(y = mathach, x = ses)) + 
            geom_point(  
1                color = "black",
2                fill = "white",
3                size = 3,
4                shape = 21
                ) + 
5            labs(y = "Math Achievement",
                 x = "Socioeconomic status",
                 title = "A scatterplot")
1
color: argument for color of the boarders of the points (if fill \(\neq\) NULL).
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!).
3
size argument: Size of the points.
4
shape argument: Shape of the points (for more see https://ggplot2.tidyverse.org/articles/ggplot2-specs.html).
5
labs layer: Provide names of axis and title.
Code
print(p_scp_c)

8.8.3 geom_point layer (colored by group)

p_scp_c2 <- ggplot(data = dat,  
                   aes(y = mathach, x = ses)) + 
1                geom_point(aes(color = factor(female)),
2                           alpha = .5,
3                           size = 2.5) +
4                scale_color_manual(
5                  values = c("black", "red"),
6                  labels = c("0" = "Male",
                             "1" = "Female")) +
7                labs(y = "Math Achievement",
                     x = "Socioeconomic status",
                     color = "Gender",
                     title = "A customized scatterplot")
1
Use another aesthetic mappings (aes() function) to provide information which points should be colored.
2
alpha argument: Refers to the opacity of the points.
3
size argument: Size of the points.
4
scale_color_manual layer: Customize the discrete color scale.
5
values argument: Change color
6
labels argument: Provide labels (alternatively, you could use the factor() function in advance)
7
labs layer: Provide names of axis, legend title, & title
Code
print(p_scp_c2)

8.9 Boxplot

8.9.1 geom_boxplot layer (basic)

0p_boxp <- ggplot(data = dat,
                aes(y = mathach,
                    x = factor(schtype))) +
1            geom_boxplot()
0
Provide data and aesthetics mappings (see key components slide).
1
Add the geom_boxplot().
Code
print(p_boxp)

8.9.2 geom_boxplot layer (customized)

p_boxp_c <- ggplot(data = dat,  
                   aes(y = mathach,
                       x = factor(schtype))) + 
                geom_boxplot(  
1                    color = "#00ff00",
2                    fill = "salmon",
3                    size = 3,
4                    width = 0.5,
5                    staplewidth = 0.25) +
6                labs(y = "Math Achievement",
                     x = "School Type",
                     title = "A customized boxplot")
1
color: argument for color of the boarders of the points (if fill \(\neq\) NULL).
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!).
3
size argument: Size boarders.
4
width argument: Controls width of the bars.
5
staplewidth argument: Controls the width of the staples.
6
labs layer: Provide names of axis and title.
Code
print(p_boxp_c)

8.10 More Adjustment Options

8.10.1 Facetting plots

Facetting allows to split data into subsets and display them across different plots. This can be done with the facet_wrap() or facet_grid() functions. To demonstrate the facet_grid() option, I used the customized histogram (p_hist_c) which was generated here.

p_hist_c +
     facet_grid(rows = vars(female)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(row dimension)", 
          sep = "\n")
          )

p_hist_c +
     facet_grid(cols = vars(female)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(column dimension)", 
          sep = "\n")
          )

p_hist_c +
     facet_grid(cols = vars(female),
                rows = vars(minority)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(column and row dimension)", 
          sep = "\n")
          )

8.10.2 Labeling factors

Labeling factors may helpful for data visualization (not necessarily for data analyses!), because ggplot2 then directly access the labels. This can be done with the levels and labels arguments of the factor() function.

1dat$Gender <- factor(dat$female,
2                     levels = c(0, 1),
3                     labels = c("male",
                                "female"))
1
Use factor() function and access variable.
2
Provide the original levels in the levels argument.
3
Provide “publication-ready” labels in the labels argument.

Check with the str() function.

str(dat$Gender)
 Factor w/ 2 levels "male","female": 2 2 1 1 1 1 2 1 2 1 ...
Code
ggplot(data = dat,
       aes(y = mathach, x = factor(female))) +
     geom_boxplot(width = .2,
                  staplewidth = .2) +
     labs(title = "Boxplot with original female variable")

Code
ggplot(data = dat,
       aes(y = mathach, x = Gender)) +
     geom_boxplot(width = .2,
                  staplewidth = .2) +
     labs(title = "Boxplot with transformed female variable Gender")

8.11 Some Questions

NoteThis is work in progress…

… and needs to be completed.

How many components are required to create a ggplot2 visualization?

How do you add a new layer to a ggplot2 plot?