bquote(R^2) # Expression
R^2
"R^2" # Text
[1] "R^2"
March 15, 2024
Creating ggplot2 figures with special characters such as superscripts (R2) math equations (\(\sqrt{x}\)) or greek letters (\(\omega\), \(\lambda\)), can be a bit of a headache.
I recently created some figures for my mom which required special characters in the axes as well as in annotations, and it reminded me of how much of a pain it can be, especially because depending on what you want to do, you need to use a different process for it.
If you want to create your annotations programatically (e.g., in a column of your data frame), you need a different process than if you were going to create them directly in the ggplot function calls.
There are also different layers in ggplots which require different inputs. Some can take an expression, and some only text, so you need to remember what to use for each of those.
So I decided to create this note as a reference for my future self, if for no one else 😁.
First we’ll go over when to use text vs. expressions and how to convert between the two for when you use them directly vs. programmatically. Then, because that’s super confusing, we’ll go through a bunch of examples of both.
We’re going to be creating text with special symbols or characters by using plotmath
and R expressions. For example, R^2
gives you \(R^2\). See ?plotmath
or the Appendix table for how to code the symbols or expressions you want to use.
Sometimes ggplot needs this as text, sometimes as an expression.
Further, if you’re creating a label directly, it’s generally easier to create it as an expression and convert it to text if you need to.
On the other hand, if you’re creating labels programmatically, you’ll generally create them as text and will then have to convert to an expression as required.
In a nutshell…
name
argument in scale_XXX()
as well as labs()
name = bquote(R^2)
name = parse(text = "R^2")
geom_text()
, geom_label()
, annotate(geom = "text")
etc.label = deparse(bquote(R^2)), parse = TRUE
1
label = "R^2", parse = TRUE
parse = TRUE
tells the function to turn the text into an expressionTo summarize
Layer | Direct Use Create with expression |
Programmatic Use Create with text |
---|---|---|
label requires expression |
Expressionbquote()
|
Parse text to expressionparse(text = "") ) |
geom requires text |
Deparse expression to textdeparse(bquote()) and use parse = TRUE
|
Text ("" )and use parse = TRUE
|
See ?plotmath
or the Appendix table for how to code other symbols or expressions you want to use.
Here are some suggestions…
alpha*","~beta
)==
for equals (see Appendix table for more examples)''^137*Cs
when you need to put superscript before an elementYou can test if you have created a text expression correctly by using parse(text = XXX)
parse(text = "R^2")
expression(R^2)
Here we create various non-dynamic text labels directly in the ggplot()
code.
library(ggplot2)
ggplot() +
theme_bw() +
# Use `bquote()` in labels
scale_x_continuous(name = bquote("Measurement"~(mu*g/L))) +
scale_y_continuous(name = bquote(M/g)) +
labs(title = bquote("Use quotes to mark normal text"~(mu*g/L)~(over(mu*g, L))~sqrt(x)),
subtitle = bquote("Use ~ to link elements together with a space (or more!)"~~~~~alpha*","~beta*","~Gamma),
caption = bquote(sum(x[i], i==1, n))) +
# Use `deparse(bquote())` along with `parse = TRUE` in geoms
annotate(geom = "text", x = 0.5, y = 0.5, label = deparse(bquote(P==0.001*";"~R^2==0.45)), parse = TRUE, size = 5) +
geom_text(x = 0.5, y = 0.48, aes(label = deparse(bquote(''^137*Cs))), parse = TRUE, size = 5) +
geom_text(x = 0.5, y = 0.52, aes(label = deparse(bquote(R[adj]^2==0.41))), parse = TRUE, size = 5)
You’ll want to use dynamic or programmatic labels in situations where your labels are created in a data frame (e.g., different annotations for different facets in a plot, such as \(R^2\)s for different models, or special characters in your facet labels). Or perhaps you have a function which creates your plots.
First we’ll create some dynamic content to display. This will be text versions of plotmath
expressions.
library(ggplot2)
library(palmerpenguins) # data
library(dplyr) # manipulate the data
p <- mutate(penguins, sp = paste0("'", species, "'[(italic(", island, "))]"))
samples <- count(p, sp, species, island) |>
mutate(label = paste0("n['(", species, ", ", island, ")'] == ", n))
samples
# A tibble: 5 × 5
sp species island n label
<chr> <fct> <fct> <int> <chr>
1 'Adelie'[(italic(Biscoe))] Adelie Biscoe 44 n['(Adelie, Biscoe)']…
2 'Adelie'[(italic(Dream))] Adelie Dream 56 n['(Adelie, Dream)'] …
3 'Adelie'[(italic(Torgersen))] Adelie Torgersen 52 n['(Adelie, Torgersen…
4 'Chinstrap'[(italic(Dream))] Chinstrap Dream 68 n['(Chinstrap, Dream)…
5 'Gentoo'[(italic(Biscoe))] Gentoo Biscoe 124 n['(Gentoo, Biscoe)']…
labels <- list("x" = paste0("'Bill Length'~mm[(", paste0(range(p$year), collapse = "-"), ")]"),
"y" = paste0("'Flipper Length'~mm[(", paste0(range(p$year), collapse = "-"), ")]"))
labels
$x
[1] "'Bill Length'~mm[(2007-2009)]"
$y
[1] "'Flipper Length'~mm[(2007-2009)]"
Now let’s add this content to our plot
ggplot(data = p, aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point() +
geom_text(data = samples, x = -Inf, y = +Inf, aes(label = label), parse = TRUE,
hjust = -0.1, vjust = 1.5) +
facet_wrap(~ sp, labeller = label_parsed)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
As before, let’s start by creating some dynamic content to add to our plots. We’ll create this by creating text versions of the expressions we want to use.
{}
around R^2
to ensure that the [adj]
is actually subscript to the whole R^2
, as opposed to just the the 2
(otherwise you’d get \(R^{2_{adj}}\)). Above, we just used a different order, R[adj]^2
to avoid this.<=0.001
or format(nsmall = 3)
to ensure there are always three digits after the decimal, and we then put the P-value in quotes (’’) because it is now text, not a number.library(ggplot2)
library(palmerpenguins) # data
library(dplyr) # manipulate the data
library(tidyr) # unnest() to convert nested data back into a regular data frame
library(purrr) # map() to loop over models and leables
library(broom) # tidy() to extract model information
p <- penguins |>
add_count(species) |>
mutate(sp = paste0("'", species, "'"),
sp = if_else(species == "Adelie", paste0(sp, "^{1}"), sp),
sp = paste0(sp, "~(n == frac(", n, ", ", n(), "))"))
stats <- p |>
nest(data = -"sp") |>
mutate(model = map(data, \(x) lm(flipper_length_mm ~ bill_length_mm, data = x)),
labels = map(model, glance)) |>
unnest(cols = "labels") |>
mutate(p_val = round(p.value, 3),
p_val = if_else(p.value < 0.001, "<0.001", paste0("=='", format(p.value, nsmall = 3), "'")),
stats = paste0("P", p_val, "*';'~{R^2}[adj]==", round(adj.r.squared, 2)))
select(stats, sp, stats)
# A tibble: 3 × 2
sp stats
<chr> <chr>
1 'Adelie'^{1}~(n == frac(152, 344)) P<0.001*';'~{R^2}[adj]==0.1
2 'Gentoo'~(n == frac(124, 344)) P<0.001*';'~{R^2}[adj]==0.43
3 'Chinstrap'~(n == frac(68, 344)) P<0.001*';'~{R^2}[adj]==0.21
labels <- list("x" = paste0("'Bill Length'~mm[(", paste0(range(p$year), collapse = "-"), ")]"),
"y" = paste0("'Flipper Length'~mm[(", paste0(range(p$year), collapse = "-"), ")]"))
labels
$x
[1] "'Bill Length'~mm[(2007-2009)]"
$y
[1] "'Flipper Length'~mm[(2007-2009)]"
Now let’s add this to our plot
ggplot(data = p, aes(x = bill_length_mm, y = flipper_length_mm)) +
geom_point() +
geom_text(data = stats, x = -Inf, y = +Inf, aes(label = stats), parse = TRUE,
hjust = -0.1, vjust = 1.5) +
labs(x = parse(text = labels$x),
y = parse(text = labels$y),
caption = parse(text = "''^1*'Sampled on all three islands'")) +
scale_y_continuous(limits = \(x) c(x[1], x[2]*1.04)) +
facet_wrap(~ sp, labeller = label_parsed)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
To find errors, test your labels with parse()
.
parse(text = "R^2")
expression(R^2)
expression(R^2)
parse(text = p$sp[1])
expression("Adelie"^{
1
} ~ (n == frac(152, 344)))
parse(text = labels$x)
expression("Bill Length" ~ mm[(2007 - 2009)])
parse(text = stats$stats[1])
expression(P < 0.001 * ";" ~ {
R^2
}[adj] == 0.1)
You can also test them with bquote()
bquote(R^2)
R^2
If you have an error in your label, parse()
(or bquote()
) will fail.
*
or ~
between separate elements?If you run demo("plotmath")
, you’ll get a series of tables showing the outputs of the plotmath codes in plots. However I don’t really like them, so here is my recreation using gt and ggplot2 (and some hacking of the documentation).
Note that there are some symbols that appear as white squares (especially lower in the table). This means that the font I’m using doesn’t support those symbols. If you get the same on a symbol you want to use, see about switching up your fonts. Unfortunately that is non-trivial 😢.
library(showtext)
library(stringr)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(gt)
# Get the table
docs <- tools:::fetchRdDB(file.path(system.file("help", package = "grDevices"), "grDevices"))
docs <- docs$plotmath
docs <- capture.output(docs)
docs <- docs[-seq(1, str_which(docs, "\\\\tabular\\{ll\\}(?s).*"), 1)]
docs <- docs[-seq(str_which(docs, "^( )+\\}$")[1], length(docs), 1)]
# Extract the code and descriptions
labels <- docs |>
str_remove("\\\\cr") |>
str_subset("Syntax", negate = TRUE) |>
str_replace_all("\"", "'") |>
str_squish() |>
tibble(txt = _) |>
filter(txt != "") |>
separate("txt", into = c("code", "meaning"), sep = " \\\\tab ") |>
mutate(code_raw = str_replace_all(code, "(\\\\code\\{)([^\\}]*)(\\})", "\\2"),
code = paste0("`", code_raw, "`"),
plot = 1:n(),
code_raw = if_else(code_raw == "theta1, phi1, sigma1, omega1",
"theta1*phi1*sigma1*omega1", code_raw))
# Create a temp image of each symbol - image to that we get the correct
sysfonts::font_add(family = "dejavu",
regular = "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
italic = "/usr/share/fonts/truetype/dejavu/DejaVuSans-Oblique.ttf",
bold = "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
bolditalic = "/usr/share/fonts/truetype/dejavu/DejaVuSans-BoldOblique.ttf"
)
g <- map2(labels$code_raw, labels$plot, \(x, i) {
showtext_auto()
ggplot() +
theme_void() +
geom_text(x = 0.5, y = 0.5, aes(label = x), parse = TRUE, size = 40, family = "dejavu")
})
# Create table of code plus symbol images
gt(labels) |>
text_transform(locations = cells_body(columns = plot),
fn = function(x) ggplot_image(g[as.numeric(x)], height = px(50), aspect_ratio = 2)) |>
cols_label(plot = "Plotted Symbol",
code = "Code",
meaning = "Description") |>
cols_hide(code_raw) |>
fmt_markdown(code)
Code | Description | Plotted Symbol |
---|---|---|
|
x plus y | |
|
x minus y | |
|
juxtapose x and y | |
|
x forwardslash y | |
|
x plus or minus y | |
|
x divided by y | |
|
x times y | |
|
x cdot y | |
|
x subscript i | |
|
x superscript 2 | |
|
juxtapose x, y, and z | |
|
square root of x | |
|
yth root of x | |
|
x equals y | |
|
x is not equal to y | |
|
x is less than y | |
|
x is less than or equal to y | |
|
x is greater than y | |
|
x is greater than or equal to y | |
|
not x | |
|
x is approximately equal to y | |
|
x and y are congruent | |
|
x is defined as y | |
|
x is proportional to y | |
|
x is distributed as y | |
|
draw x in normal font | |
|
draw x in bold font | |
|
draw x in italic font | |
|
draw x in bolditalic font | |
|
draw x in symbol font | |
|
comma-separated list | |
|
ellipsis (height varies) | |
|
ellipsis (vertically centred) | |
|
ellipsis (at baseline) | |
|
x is a proper subset of y | |
|
x is a subset of y | |
|
x is not a subset of y | |
|
x is a proper superset of y | |
|
x is a superset of y | |
|
x is an element of y | |
|
x is not an element of y | |
|
x with a circumflex | |
|
x with a tilde | |
|
x with a dot | |
|
x with a ring | |
|
xy with bar | |
|
xy with a wide circumflex | |
|
xy with a wide tilde | |
|
x double-arrow y | |
|
x right-arrow y | |
|
x left-arrow y | |
|
x up-arrow y | |
|
x down-arrow y | |
|
x is equivalent to y | |
|
x implies y | |
|
y implies x | |
|
x double-up-arrow y | |
|
x double-down-arrow y | |
|
Greek symbols | |
|
uppercase Greek symbols | |
|
cursive Greek symbols | |
|
capital upsilon with hook | |
|
first letter of Hebrew alphabet | |
|
infinity symbol | |
|
partial differential symbol | |
|
nabla, gradient symbol | |
|
32 degrees | |
|
60 minutes of angle | |
|
30 seconds of angle | |
|
draw x in normal size (extra spacing) | |
|
draw x in normal size | |
|
draw x in small size | |
|
draw x in very small size | |
|
draw x underlined | |
|
put extra space between x and y | |
|
leave gap for '0', but don't draw it | |
|
leave vertical gap for '0' (don't draw) | |
|
x over y | |
|
x over y | |
|
x over y (no horizontal bar) | |
|
sum x[i] for i equals 1 to n | |
|
product of P(X=x) for all values of x | |
|
definite integral of f(x) wrt x | |
|
union of A[i] for i equals 1 to n | |
|
intersection of A[i] | |
|
limit of f(x) as x tends to 0 | |
|
minimum of g(x) for x greater than 0 | |
|
infimum of S | |
|
supremum of S | |
|
normal operator precedence | |
|
visible grouping of operands | |
|
invisible grouping of operands | |
|
specify left and right delimiters | |
|
use scalable delimiters | |
|
special delimiters | |
|
special delimiters | |
|
special delimiters |
You could also use straight text “R^2” without deparse(bquote())
if you wanted to work with text.↩︎