Let’s get to coding!
We will start with an example where we take mpg
data and make a scatterplot.
How do we get from this:
The data
head(mpg)
to this:
The plot
Let’s plot!
Load the ggplot2 library:
library(ggplot2)
ggplot
is the base layer.
ggplot()
This is a good place to for us to define our plot’s default data.
ggplot(data = mpg)
There isn’t much being plotted yet… Let’s tell our plot about our mapping, where our x
will be our cty
and y
will be our hwy
.
ggplot(
data = mpg,
mapping = aes(x = cty, y = hwy)
)
Our plot now has enough information to set some automatic axis limits based on the data.
ggplot(
data = mpg,
mapping = aes(x = cty, y = hwy)
) +
geom_point()
We can map the number of cylinders as a set of discrete values to color.
ggplot(
data = mpg,
mapping = aes(x = cty, y = hwy)
) +
geom_point(
mapping = aes(color = factor(cyl))
)
At this point, we have the base of the plot. We can save this to a variable, mpg_base_plot
, and make additional adjustments.
mpg_base_plot <- ggplot(
data = mpg,
mapping = aes(x = cty, y = hwy)
) +
geom_point(
mapping = aes(color = factor(cyl))
)
mpg_base_plot
To add the reference line, we add a layer like so:
mpg_base_plot +
geom_abline(
slope = 1,
intercept = 0,
color = 'black',
linetype = 'dotted'
)
Note that this geom layer can be added on top of a blank plot. All geom_
prefixed functions return an environment
object that holds onto the information we call it with.
ggplot() +
geom_abline(
slope = 1,
intercept = 0,
color = 'black',
linetype = 'dotted'
)
Let’s continue. Our plot looks a bit funny – the grid for the x and y are not 1 to 1. Let’s use a coord_
function to adjust this:
mpg_base_plot +
geom_abline(
slope = 1,
intercept = 0,
color = 'black',
linetype = 'dotted'
) +
coord_fixed(ratio = 1)
We can clean up the plot like so:
mpg_plot <- mpg_base_plot +
geom_abline(
slope = 1,
intercept = 0,
color = 'black',
linetype = 'dotted'
) +
coord_fixed(ratio = 1) +
scale_color_manual(
name = '# of Cyl',
values = c('skyblue','royalblue', 'blue', 'navy')
) +
ggtitle('Miles per gallon') +
theme_light()
mpg_plot
We can further manipulate the plot using things like facet_
.
mpg_plot +
facet_grid(.~class)
Let’s experiment
We can get the base plots from the ggplot cheatsheets on our computer by sourcing this R script.
source('https://goo.gl/3EQSXt')
The magic behind the +
We can discover how the +
is helping us construct these plots by running:
methods('+')
[1] +.Date +.gg* +.POSIXt
see '?methods' for accessing help and source code
From the result, we can see that the +
function is overloaded for objects that match the gg
namespace. If we look at the code for this function, we learn more about how it works.
Object-oriented
We can explore more of how ggplot
works underneath by looking at:
base_plot <- ggplot()
names(base_plot)
[1] "data" "layers" "scales" "mapping" "theme" "coordinates" "facet" "plot_env"
[9] "labels"
It’s an object that can be mutated. Other functions output objects as well.
mpg_mapping <- aes(x = cty, y = hwy)
mpg_mapping
* x -> cty
* y -> hwy
Let’s mutate the base_plot
:
base_plot$data <- mpg
base_plot$mapping <- mpg_mapping
base_plot$layers <- c(base_plot$layers, geom_point())
base_plot
Extensibility
You can also create custom stat_
s, geom_
s, and theme_
s as needed. This vignette explains the object-oriented patterns for ggplot2, and how to extend the library for your needs.
There’s actually a lot we can learn from the source code, such as all possible aes
parameters.
The makings of a pie
Grammar-based graphics lends flexibility
Back to Leland Wilkinson’s point about making a pie – that a pie chart is really a stacked bar chart that has been transformed with polar coordinates where the y is mapped to the angle, a.k.a. theta.
We start with a bar chart:
diamonds_by_cut_base <- ggplot(data = diamonds, mapping = aes(x = cut, fill = cut))
diamonds_by_cut_base +
geom_bar()
Here, we have a stacked bar chart of diamonds grouped by cut.
diamonds_by_cut_stacked <- diamonds_by_cut_base +
geom_bar(width = 1, aes(x=factor("")))
diamonds_by_cut_stacked
Taking the bar chart, we simply “add” a coord_polar
to it.
diamonds_by_cut_stacked +
coord_polar() +
scale_x_discrete("")
By default, coord_polar
maps the theta
from x
and the radius
from y
.
bullseye_coords <- coord_polar()
bullseye_coords$theta
[1] "x"
bullseye_coords$r
[1] "y"
We need to a coord_polar
where the theta
maps from y
:
pie_coords <- coord_polar(theta = "y")
pie_coords$theta
[1] "y"
Now, we can transform the stacked bar chart to a pie chart:
diamonds_by_cut_stacked +
pie_coords
?coord_polar
A layered grammar is powerful
The full grammar consists of:
- The base plot –
ggplot
– with default data
and mapping
s
- Any number of layers –
geom_
or stat_
– each with
data
mapping
- plot’s
mapping
s by default
- defined by
aes
thetics
geom_
- Sometimes defined by default by
stat_
if defined
stat_
- Identity by default
- Sometimes defined by default by
geom_
if defined
position
- Optional, any adjustments necessary to geom positioning
Additionally, the grammar includes control over the whole plot’s
scale
s
- to define how data values map
aes
thetic values
coordinate
s
facet
s
