Grammar of Graphics in R

Semiology of Graphics

Jacques Bertin, 1967

Retinal Values

  • Position
  • Size
  • Shape
  • Value
  • Color
  • Orientation
  • Texture

Types of Imposition

  • Arrangement
  • Rectilinear
  • Circular
  • Orthogonal
  • Polar

The Grammar of Graphics

Leland Wilkinson, 1999

Why a grammar?

If we endeavor to develop a charting instead of a graphing program, we will accomplish two things. First, we inevitably will offer fewer charts than people want. Second, our package will have no deep structure. Our computer program will be unnecessarily complex, because we will fail to reuse objects or routines that function similarly in different charts. And we will have no way to add new charts to our system without generating complex new code. Elegant design requires us to think about a theory of graphics, not charts.

Pie

How do we transform data to graphics?

A Layered Grammar of Graphics

Hadley Wickham, begun in 2006

A layered grammar

The essence of a graphic

  • the data
  • the mapping of that data to aesthetic properties
  • the visual display of that data as geometric elements

Basic plot

ggplot(
    data = mpg,
    mapping = aes(x = cty, y = hwy, color = factor(cyl))
  ) +
  geom_point()

Layers enable more complex plots

These elements form a layer:

  • data
  • how data maps to aesthetic properties
    • i.e. x, y, color, shape, and size
  • what geometric element is representing the data
  • any statistical transformations on data
    • i.e. count for bar charts
  • position adjustments

Layered plot

ggplot(data = mpg,  mapping = aes(x = cty, y = hwy, col = factor(cyl))) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dotted')

Additional properties of the plot for flexibility

  • scales
  • coordinates
  • facets

The possibilities are endless

ggplot(data = mpg,  mapping = aes(x = cty, y = hwy, col = factor(cyl))) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dotted') +
  scale_color_manual(name = '# of Cyl', values = c('skyblue','royalblue', 'blue', 'navy')) +
  coord_fixed(ratio = 1) +
  facet_grid(.~class)

Let’s get coding