TWS 2023

Creating Figures as an Intro to R

Using the ggplot2 package

steffilazerte
@steffilazerte@fosstodon.org
@steffilazerte
steffilazerte.ca

Compiled: 2023-04-17

Preamble

Online workshops can be challenging

Consider keeping your video on (if possible)

  • Kids? Pets? Spouses? No problem
  • But ultimately, you need be comfortable! (and you absolutely have the right to privacy)

Interrupt me!

  • Generally keep yourself muted but un-mute anytime to ask questions

Ask Questions!

  • Group trouble-shooting is really valuable
  • If you have a problem, others may also (or may have it in the future)

Screen-sharing

  • I may ask you to share your screen with the group (feel free to decline)
  • For privacy, close your email etc. Or just share your RStudio window

This is me and my creatures

This is my garden

What about you?

  • Name
  • Background (Role, Area of study, etc.)
  • Familiarity with R or Programming
  • Creatures (furry, feathery, scaley, green or otherwise)?

Outline

1. A little about R

2. Creating figures with ggplot2

3. Combining figures with patchwork

4. Saving figures

Taken this or a similar workshop before?

During activities consider…

  • Extra activities labeled “Too Easy?”
  • Using your own data
  • Exploring other aspects of ggplot2 that interest you

Feel free to ask questions even if it’s not the “official” activity!

What is R?

R is a Programming language

A programming language is a way to give instructions in order to get a computer to do something

  • You need to know the language (i.e., the code)
  • Computers don’t know what you mean, only what you type (unfortunately)
  • Spelling, punctuation, and capitalization all matter!

For example

R, what is 56 times 5.8?

56 * 5.8
[1] 324.8

Use code to tell R what to do

R, what is the average of numbers 1, 2, 3, 4?

mean(c(1, 2, 3, 4))
[1] 2.5

R, save this value for later

steffis_mean <- mean(c(1, 2, 3, 4))

R, multiply this value by 6

steffis_mean * 6
[1] 15

Why R?

R is hard

But R is powerful (and reproducible)!

(I made these slides with a mix of R and Quarto)

R is also beautiful

R is affordable (i.e., free!)

ImpostR Syndrome

ImpostR Syndrome

 

 

David Whittaker

Moral of the story?
Make friends, code in groups, learn together and don’t beat yourself up

The Goal

Artwork by @allison_horst

About R

Code, Output, Scripts

Code

  • The actual commands

Output

  • The result of running code or a script

Script

  • A text file full of code that you want to run
  • You should always keep your code in a script

For example:

mean(c(1, 2, 3, 4))
[1] 2.5

Code Output

Script

RStudio vs. R

 

RStudio

 

R

 

  • RStudio is not R
  • RStudio is a User Interface or IDE (integrated development environment)
    • (i.e., Makes coding simpler)

functions() - Do things, Return things

mean(), read_csv(), ggplot(), c(), etc.

  • Always have ()
  • Can take arguments (think ‘options’)
    • mean(x = c(2, 10, 45)),
    • mean(x = c(NA, 10, 2, 65), na.rm = TRUE)
  • Arguments defined by name or by position
    • With correct position, do not need to specify by name

By name:

mean(x = c(1, 5, 10))
[1] 5.333333

By position:

mean(c(1, 5, 10))
[1] 5.333333

R documentation

?mean

Data

Generally kept in vectors or data.frames

  • These are objects with names (like functions)
  • We can use <- to assign values to objects (assignment)

Vector (1 dimension)

my_data <- c("a", 100, "c")
my_data
[1] "a"   "100" "c"  

Data frame (2 dimensions)

my_data <- data.frame(site = c("s1", "s2", "s3"),
                      count = c(101, 102, 103),
                      treatment = c("a", "b", "c"))
my_data
  site count treatment
1   s1   101         a
2   s2   102         b
3   s3   103         c

rows x columns

Your first real code!

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
  1. Copy/paste or type this into the script window in RStudio
    • You may have to go to File > New File > R Script
  2. Click on the first line of code
  3. Run the code
    • Click ‘Run’ button (upper right) or
    • Use the short-cut Ctrl-Enter
  4. Repeat until all the code has run

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Packages
ggplot2 and palmerpenguins

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Functions
library(), ggplot(), aes(), geom_point()

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

+
(Specific to ggplot)

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Figure!

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Warning

First Code

# First load the packages
library(palmerpenguins)
library(ggplot2)

# Now create the figure
ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

Comments

Now you know R!

Let’s get started

Our data set: Palmer Penguins!

Artwork by @allison_horst

Our data set: Palmer Penguins!

library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <fct>  <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750 male    2007
 2 Adelie  Torgersen           39.5          17.4               186        3800 female  2007
 3 Adelie  Torgersen           40.3          18                 195        3250 female  2007
 4 Adelie  Torgersen           NA            NA                  NA          NA <NA>    2007
 5 Adelie  Torgersen           36.7          19.3               193        3450 female  2007
 6 Adelie  Torgersen           39.3          20.6               190        3650 male    2007
 7 Adelie  Torgersen           38.9          17.8               181        3625 female  2007
 8 Adelie  Torgersen           39.2          19.6               195        4675 male    2007
 9 Adelie  Torgersen           34.1          18.1               193        3475 <NA>    2007
10 Adelie  Torgersen           42            20.2               190        4250 <NA>    2007
# ℹ 334 more rows

Artwork by @allison_horst

Your turn!

Run this code and look at the output in the console

A basic plot

library(palmerpenguins)
library(ggplot2)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
    geom_point()

Break it down

library(palmerpenguins)
library(ggplot2)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
    geom_point()

library(palmerpenguins)

  • Load the palmerguins package
  • Now we have access to penguins data

Break it down

library(palmerpenguins)
library(ggplot2)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
    geom_point()

library(ggplot2)

  • Load the ggplot2 package
  • Now we have access to the ggplot() function
    • (and aes() and geom_point() etc.)

Break it down

library(palmerpenguins)
library(ggplot2)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
    geom_point()

ggplot()

  • Set the attributes of your plot
  • data = Dataset
  • aes = Aesthetics (how the data are used)
  • Think of this as your plot defaults

Break it down

library(palmerpenguins)
library(ggplot2)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
    geom_point()

geom_point()

  • Choose a geom function to display the data
  • Always added to a ggplot() call with +

ggplots are essentially layered objects, starting with a call to ggplot()

Plots are layered

ggplot(data = penguins, aes(x = sex, y = body_mass_g))

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot()

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_point()

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_violin()

Plots are layered

You can add multiple layers

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot()

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot() +
  geom_point(size = 2, colour = "red")

Order matters

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_point(size = 2, colour = "red") +
  geom_boxplot()

Plots are objects

Any ggplot can be saved as an object

g <- ggplot(data = penguins, aes(x = sex, y = body_mass_g))
g

g + geom_boxplot()

More Geoms

(Plot types)

Geoms: Lines

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
  geom_line()

Geoms: Histogram

ggplot(data = penguins, aes(x = body_mass_g)) +
  geom_histogram(binwidth = 100)

Note: We only need 1 aesthetic here

Geoms: Barplots

Let ggplot count your data

ggplot(data = penguins, aes(x = sex)) +
  geom_bar()

Geoms: Barplots

You can also provide the counts

# Create our own data frame
species_counts <- data.frame(species = c("Adelie", "Chinstrap", "Gentoo"),
                             n = c(152, 68, 124))

ggplot(data = species_counts, aes(x = species, y = n)) +
  geom_bar(stat = "identity")

Your Turn: Create this plot

library(ggplot2)

ggplot(data = ____, aes(x = ____, y = ____)) +
  geom_____(____)

Showing data by group

Mapping aesthetics

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
  geom_point()

Mapping aesthetics

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = sex)) +
  geom_point()

colour = sex

Mapping aesthetics

ggplot automatically populates the legends (combining where it can)

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = sex, shape = sex)) +
  geom_point()

Faceting: facet_wrap()

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = sex)) +
  geom_point() +
  facet_wrap(~ species)

Split plots by one grouping variable

Faceting: facet_grid()

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = sex)) +
  geom_point() +
  facet_grid(sex ~ species)

Split plots by two grouping variables

Your Turn: Create this plot

ggplot(data = ____, aes(_____________________________________)) +
  ______________ +
  ______________

Hint: colour is for outlining with a colour, fill is for ‘filling’ with a colour
Too Easy? Split boxplots by sex and island

Trendlines / Regression Lines

Trendlines / Regression lines

geom_line() is connect-the-dots, not a trend or linear model

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
  geom_point() +
  geom_line()

Not what we’re looking for

Trendlines / Regression lines

Let’s add a trend line properly

Start with basic plot:

g <- ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm)) +
  geom_point()
g

Trendlines / Regression lines

Add the stat_smooth()

  • lm is for “linear model” (i.e. trendline)
  • grey ribbon = standard error
g + stat_smooth(method = "lm")

Trendlines / Regression lines

Add the stat_smooth()

  • remove the grey ribbon se = FALSE
g + stat_smooth(method = "lm", se = FALSE)

Trendlines / Regression lines

A line for each group

  • Specify group (here we use colour to specify species)
g <- ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()
g

Trendlines / Regression lines

A line for each group

  • stat_smooth() automatically uses the same grouping
g + stat_smooth(method = "lm", se = FALSE)

Trendlines / Regression lines

A line for each group AND overall

g +
  stat_smooth(method = "lm", se = FALSE) +
  stat_smooth(method = "lm", se = FALSE, colour = "black")

Your Turn: Create this plot

  • A scatter plot: Flipper Length by Body Mass grouped by Species
  • With a single regression line for the overall trend

Too Easy? Create a separate plot for each sex as well

Customizing plots

Customizing: Starting plot

Let’s work with this plot

g <- ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point()

Customizing: Labels

g + labs(title = "Bill Length vs. Body Mass",
         x = "Body Mass (g)",
         y = "Bill Length (mm)",
         colour = "Species", tag = "A")

Practice for later: Add proper labels to some of your previous plots

Customizing: Built-in themes

Customizing: Axes

scale_ + (x or y) + type (continuous, discrete, date, datetime)

  • scale_x_continuous()
  • scale_y_discrete()
  • etc.

Common arguments

g + scale_x_continuous(breaks = seq(0, 20, 10)) # Tick breaks
g + scale_x_continuous(limits = c(0, 15))       # xlim() is a shortcut for this
g + scale_x_continuous(expand = c(0, 0))        # Space between axis and data

Customizing: Axes

Breaks

g + scale_x_continuous(breaks = seq(2500, 6500, 500))

Customizing: Axes

Limits

g + scale_x_continuous(limits = c(3000, 4000))

Customizing: Axes

Space between origin and axis start

g + scale_x_continuous(expand = c(0, 0))

Customizing: Aesthetics

Using scales

scale_ + aesthetic (colour, fill, size, etc.) + type (manual, continuous, datetime, etc.)

g + scale_colour_manual(name = "Type", values = c("green", "purple", "yellow"))

Customizing: Aesthetics

Using scales

Or be very explicit:

g + scale_colour_manual(
  name = "Type",
  values = c("Adelie" = "green", "Gentoo" = "purple", "Chinstrap" = "yellow"),
  na.value = "black")

Customizing: Aesthetics

For colours, consider colour-blind-friendly scale

viridis_d for “discrete” data

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = species)) +
  geom_point() +
  scale_colour_viridis_d(name = "Type")

Customizing: Aesthetics

For colours, consider colour-blind-friendly scale

viridis_c for “continuous” data

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = flipper_length_mm)) +
  geom_point() +
  scale_colour_viridis_c(name = "Flipper Length (mm)")

Customizing: Aesthetics

Forcing

Remove the association between a variable and an aesthetic

ggplot(data = penguins, aes(x = body_mass_g, y = bill_length_mm, colour = sex)) +
  geom_point(colour = "darkblue", size = 1) +
  stat_smooth(method = "lm", se = FALSE, colour = "lightblue")

Note: When forcing, aesthetic is not inside aes()

Customizing: Legends placement

At the: top, bottom, left, right

g + theme(legend.position = "top")

Exactly here

g + theme(legend.position = c(0.15, 0.7))

Combining plots

Combining plots with patchwork

Setup

  • Load patchwork
  • Create a couple of different plots
library(patchwork)

g1 <- ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm, colour = species)) +
  geom_point()

g2 <- ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) +
  geom_boxplot()

g3 <- ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

Combining plots with patchwork

Side-by-Side 2 plots

g1 + g2

Combining plots with patchwork

Side-by-Side 3 plots

g1 + g2 + g3

Combining plots with patchwork

Stacked 2 plots

g1 / g2

Combining plots with patchwork

More complex arrangements

g2 + (g1 / g3)

Combining plots with patchwork

More complex arrangements

g2 / (g1 + g3)

Combining plots with patchwork

“collect” common legends

g2 / (g1 + g3) + plot_layout(guides = "collect")

Combining plots with patchwork

“collect” common legends

g2 / (g1 + g3 + plot_layout(guides = "collect"))

Combining plots with patchwork

Annotate

g2 / (g1 + g3) +
  plot_layout(guides = "collect") +
  plot_annotation(title = "Penguins Data Summary",
                  caption = "Fig 1. Penguins Data Summary",
                  tag_levels = "A",
                  tag_suffix = ")")

Saving plots

Saving plots

RStudio Export

Demo

ggsave()

g <- ggplot(penguins, aes(x = sex, y = bill_length_mm, fill = year)) +
  geom_boxplot()

ggsave(filename = "penguins_mass.png", plot = g)

Saving plots

Publication quality plots

  • Many publications require ‘lossless’ (pdf, svg, eps, ps) or high quality formats (tiff, png)
  • Specific sizes corresponding to columns widths
  • Minimum resolutions
g <- ggplot(penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot() +
  labs(x = "Sex", y = "Body Mass (g)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave(filename = "penguins_mass.pdf", plot = g, dpi = 300,
       height = 80, width = 129, units = "mm")

Wrapping up

Wrapping up: Common mistakes

  • The package is ggplot2, the function is just ggplot()
  • Did you remember to put the + at the end of the line?
  • Order matters!
    • If you’re using custom theme()’s, make sure you put these lines after bundled themes like theme_bw(), or they will be overwritten
  • Variables like ‘year’ are treated as continuous, but are really categories
    • Wrap them in factor()
    • e.g. ggplot(data = penguins, aes(x = factor(year), y = body_mass_g))

Wrapping up: Further reading (all Free!)

Thank you!

steffilazerte.ca

Slides created with Quarto on 2023-04-17

Extra

Your Turn!

Create a figure with…

  • Custom colour mapping (i.e. scales_....)
  • Clear, human-readable labels
  • More than one graph, each one tagged (e.g., A) or B))
  • With more than one geom type
  • At least one scatterplot with regression line

😁

OR… Load your own data and create a figure of your own!