Graphical Visualization (with ggplot2)

March 4, 2026

Agenda for today

Purpose of Data Visualizing

  • facilitates understanding of data (i.e., makes data “accessible”)

  • reveals “hidden” structures, trends and patterns

  • identifies outliers

  • helps to communicate results clearly (storytelling)

(Some) Example Diagrams

But there are more such as Dot plots, venn diagrams, maps, networks, & many more…!

The ggplot2 package

Preface

First of all, there are other graphical R packages such as:

The ggplot2 package (Wickham et al., 2026)

  • is based on the Grammar of Graphics (Wilkinson & Wills, 2005) → graphs can be composed by independent components

  • has great flexibility (and add-on packages) → nothing is impossible

  • has a broad community support (e.g., https://stackoverflow.com/), and AI tools help to generate ggplot2 code

  • belongs to the tidyverse (Wickham, 2023) package collection

Key Components

Every ggplot needs 3 components to produce a plot:

1ggplot(data = diamonds,
2       aes(x = price, y = carat)) +
3    geom_point()
1
the data,
2
the so-called aesthetic mappings (i.e., which variables from the dataset should be used and how should they be mapped in the plot), and
3
at least one layer that defines what kind of visualization is desired (e.g., geom_point for a scatter plot)

Note

Whereas the data and the aesthetic mappings (aes()) are stated within the ggplot() function, the geom_point layer is added with a +.

Getting Started with ggplot2

First, install and load the package.

#install.packages("ggplot2")
library(ggplot2)

Setting a theme (optional). This can be done by using theme_set() function. There are some built-in themes such as theme_minimal(), theme_bw(), or theme_classic().

theme_set(theme_classic())

Further, you can customize the theme by using theme_update() function. For example, you can change the text size, font family, and color.

# install.packages("extrafont") # this may take a while
extrafont::loadfonts() 
theme_update(text = element_text(
    size=25,
    family="Helvetica",
    color = "blue"))

Example Data

dat <- merTools::hsb
head(dat[,1:7])
  schid minority female    ses mathach size schtype
1  1224        0      1 -1.528   5.876  842       0
2  1224        0      1 -0.588  19.708  842       0
3  1224        0      0 -0.528  20.349  842       0
4  1224        0      0 -0.668   8.781  842       0
5  1224        0      0 -0.158  17.898  842       0
6  1224        0      0  0.022   4.583  842       0

Mapping Plot Types to geom_* Layers

  • bar chart → geom_bar() (jump)
  • histogram → geom_histogram() (jump)
  • boxplot → geom_boxplot() (jump)
  • line chart → geom_line()
  • scatter plot → geom_point() (jump)
  • text labels → geom_text()

Bar Chart

geom_bar layer (basic)

0p_bp <- ggplot(data = dat,
                aes(x = factor(female))) +
1            geom_bar()
0
Provide data and aesthetics mappings (see Key Components)
1
Add the geom_bar() layer

geom_bar layer (basic, print)

Code
print(p_bp)

geom_bar layer (customized)

p_bp_c <- ggplot(data = dat,
                aes(x = factor(female))) +
1            geom_bar(color = "blue",
2                     linewidth = 1.5,
3                     fill = "pink") +
4            scale_x_discrete(
               label = c("0" = "Male",
                         "1" = "Female")) +
5            labs(y = "Count", x = "Gender")
1
color argument: Customize color of boarder
2
linewidth argument: Customize size of boarder
3
fill argument: Customize color of bars
4
label argument in scale_x_discrete function can be used to provide labels for categories
5
labs function: Customize text on x- and y-axes

geom_bar layer (customized, print)

Code
print(p_bp_c)

Histogram

geom_histogram layer (basic)

0p_hist <- ggplot(data = dat,
                 aes(x = mathach)) +
1            geom_histogram()
0
Provide data and aesthetics mappings (see key components slide)
1
Add the geom_histogram()

geom_histogram layer (basic, print)

Code
print(p_hist)

geom_histogram layer (customized)

p_hist_c <- ggplot(data = dat,  
                   aes(x = mathach)) + 
                geom_histogram(  
1                    color = "black",
2                    fill = "white",
3                    bins = 30,
4                    binwidth = 1.5
                    ) + 
5                labs(y = "Count",
                     x = "Math Achievement",
                     title = "My nice histogram")
1
color: argument for color of the boarders of the bars
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!)
3
bins argument: Number of bins
4
binwidth argument: The width of the bins
5
labs layer: Provide names of axis and title

geom_histogram layer (customized, print)

Code
print(p_hist_c)

Scatterplot

geom_point layer (basic)

0p_scp <- ggplot(data = dat,
                aes(y = mathach, x = ses)) +
1            geom_point()
0
Provide data and aesthetics mappings (see key components slide)
1
Add the geom_point()

geom_point layer (basic, print)

Code
print(p_scp)

geom_point layer (customized)

p_scp_c <- ggplot(data = dat,  
                  aes(y = mathach, x = ses)) + 
            geom_point(  
1                color = "black",
2                fill = "white",
3                size = 3,
4                shape = 21
                ) + 
5            labs(y = "Math Achievement",
                 x = "Socioeconomic status",
                 title = "A scatterplot")
1
color: argument for color of the boarders of the points (if fill \(\neq\) NULL).
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!).
3
size argument: Size of the points.
4
shape argument: Shape of the points (for more see https://ggplot2.tidyverse.org/articles/ggplot2-specs.html).
5
labs layer: Provide names of axis and title.

geom_point layer (customized, print)

Code
print(p_scp_c)

geom_point layer (colored by group)

p_scp_c2 <- ggplot(data = dat,  
                   aes(y = mathach, x = ses)) + 
1                geom_point(aes(color = factor(female)),
2                           alpha = .5,
3                           size = 2.5) +
4                scale_color_manual(
5                  values = c("black", "red"),
6                  labels = c("0" = "Male",
                             "1" = "Female")) +
7                labs(y = "Math Achievement",
                     x = "Socioeconomic status",
                     color = "Gender",
                     title = "A customized scatterplot")
1
Use another aesthetic mappings (aes() function) to provide information which points should be colored.
2
alpha argument: Refers to the opacity of the points.
3
size argument: Size of the points.
4
scale_color_manual layer: Customize the discrete color scale.
5
values argument: Change color
6
labels argument: Provide labels (alternatively, you could use the factor() function in advance)
7
labs layer: Provide names of axis, legend title, & title

geom_point layer (colored by group, print)

Code
print(p_scp_c2)

Boxplot

geom_boxplot layer (basic)

0p_boxp <- ggplot(data = dat,
                aes(y = mathach,
                    x = factor(schtype))) +
1            geom_boxplot()
0
Provide data and aesthetics mappings (see key components slide).
1
Add the geom_boxplot().

geom_boxplot layer (basic, print)

Code
print(p_boxp)

geom_boxplot layer (customized)

p_boxp_c <- ggplot(data = dat,  
                   aes(y = mathach,
                       x = factor(schtype))) + 
                geom_boxplot(  
1                    color = "#00ff00",
2                    fill = "salmon",
3                    size = 3,
4                    width = 0.5,
5                    staplewidth = 0.25) +
6                labs(y = "Math Achievement",
                     x = "School Type",
                     title = "A customized boxplot")
1
color: argument for color of the boarders of the points (if fill \(\neq\) NULL).
2
fill: argument for the color of the bars (do not use it together with the fill argument in the aes()!).
3
size argument: Size boarders.
4
width argument: Controls width of the bars.
5
staplewidth argument: Controls the width of the staples.
6
labs layer: Provide names of axis and title.

geom_boxplot layer (customized, print)

Code
print(p_boxp_c)

More Adjustment Options

Facetting plots

Facetting allows to split data into subsets and display them across different plots. This can be done with the facet_wrap() or facet_grid() functions. To demonstrate the facet_grid() option, I used the customized histogram (p_hist_c) which was generated here.

p_hist_c +
     facet_grid(rows = vars(female)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(row dimension)", 
          sep = "\n")
          )

p_hist_c +
     facet_grid(cols = vars(female)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(column dimension)", 
          sep = "\n")
          )

p_hist_c +
     facet_grid(cols = vars(female),
                rows = vars(minority)) +
     labs(title = paste(
          "Overwrite the title", 
          "to demonstrate facet_grid",
          "(column and row dimension)", 
          sep = "\n")
          )

Labeling factors

Labeling factors may helpful for data visualization (not necessarily for data analyses!), because ggplot2 then directly access the labels. This can be done with the levels and labels arguments of the factor() function.

1dat$Gender <- factor(dat$female,
2                     levels = c(0, 1),
3                     labels = c("male",
                                "female"))
1
Use factor() function and access variable.
2
Provide the original levels in the levels argument.
3
Provide “publication-ready” labels in the labels argument.

Check with the str() function.

str(dat$Gender)
 Factor w/ 2 levels "male","female": 2 2 1 1 1 1 2 1 2 1 ...

Labeling factors: Original female variable

Code
ggplot(data = dat,
       aes(y = mathach, x = factor(female))) +
     geom_boxplot(width = .2,
                  staplewidth = .2) +
     labs(title = "Boxplot with original female variable (that is declared as numeric)")

Labeling factors: Transformed Gender variable

Code
ggplot(data = dat,
       aes(y = mathach, x = Gender)) +
     geom_boxplot(width = .2,
                  staplewidth = .2) +
     labs(title = "Boxplot with transformed female variable Gender (that is a factor)")

Questions?

References

R Core Team. (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Sarkar, D. (2008). Lattice: Multivariate data visualization with r. Springer. http://lmdvr.r-forge.r-project.org
Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., & Despouy, P. (2026). Plotly: Create interactive web graphics via plotly.js. https://plotly-r.com
Wickham, H. (2011). Ggplot2. WIREs Computational Statistics, 3(2), 180–185. https://doi.org/10.1002/wics.147
Wickham, H. (2023). Tidyverse: Easily install and load the tidyverse. https://tidyverse.tidyverse.org
Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., & van den Brand, T. (2026). ggplot2: Create elegant data visualisations using the grammar of graphics. https://ggplot2.tidyverse.org
Wilkinson, L., & Wills, G. (2005). The grammar of graphics (2nd ed). Springer.