ggplotHelper: easy and beautiful plots in R

Visual expression is key in transmitting information between producers and users of research regardless of the scientific field. After many painful hours of re-writing code for plots in R based on the extensive ggplot2 package we finally created an easy-to-use overlay for making beautiful plots in R.

Following our two latest posts on multivariate stochastic volatility (part 1 and part 2) we bring an intermezzo, not related to trading, on how to make beautiful plots in R.

The gold standard in the R community is to create plots based on the extensive ggplot2 package built on the principles of the grammar of graphics in R. It has many settings and can create practically any plot you desire. However, as other R users may have noticed it often gets cumbersome to rewrite code for your next ggplot because of all the code needed to create the customization you want. Or you keep forgetting the functions for creating a specific plot.

After many painful hours in this trap we finally liberated ourselves by creating a package called ggplotHelper, hosted on GitHub, which is essentially a series of overlay functions for the ggplot function. It reduces the time spent on creating plots significantly but it comes with certain constrains. We have restricted the colour palette to grey and a few easy-to-read colours. However, these can easily be modified in the colour functions. In addition our package does not have an overlay for all plot functions in ggplot.

If you like our design template you can now easily create similar charts yourselves. Before you enter the land of ggplotHelper, we have to warn that the package has not yet entered a stable version 1.0.0, so we will make many commits as the package evolves. The package does not come with unit tests at this stage so sometimes new code may create bugs elsewhere in the code.

How to get ggplotHelper

Our ggplotHelper package is published as open source and hosted on GitHub. With the devtools package it is very easy to install ggplotHelper using the install_github function.


library(devtools)

install_github("pgarnry/ggplotHelper")

Now ggplotHelper is installed. Today we will only discuss the three functions bar_chart, line_chart and save_png . For the full package explanation you will have to wait for the vignette or the function documentation as we do try to make documentation self-explanatory.

If you experience issues with our functions you are welcome to raise tickets in our GitHub repository. You can also contribute code to make it better.

Colour palette

Colours are controlled through two functions grey_theme and chart_colours. The grey_theme function allows customization of plot margin and legend position (see ggplot documentation for available options). These feautures can be set through the ellipsis which will be clear very soon.

Below is a small snippet of the grey_theme function showing the only available options.

grey_theme <- function(legend.position = "bottom",
plot.margin = c(0.7, 1.2, 0.5, 0.5)) {

If you want to change the colour scheme you simply clone the repository and change the colours in the grey_theme function.

Bar plots

The bar plot is a toolbox stable for any researcher. With ggplotHelper it only takes one line of code to create a bar plot.

data(mtcars)
mtcars$name <- rownames(mtcars)

# not a pretty bar plot because of overlapping x-axis names
bar_chart(mtcars, y = "mpg", x = "name")

bar1

This bar plot would require around 50 lines of code to create the design, colours and data handling with the ggplot2 functions. To get it to one line of code, we have simplified and made choices for the user.

The x-axis names are overlapping because of the large number of bars. This problem is easily fixed by flipping the data.

# the flip option makes the plot prettier
bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE)

bar2

Flipping the data looks easy but behind the lines we are handling the dirtiness. Next we add a title.

# add a title
bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE,
title = "Miles per gallon across different car models")

bar3

Again we make choices for the user by adding a line break in the title to get the proper distance to the plot window. Next we change the y-axis scale.

# change the y scale
bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE,
title = "Miles per gallon across different car models",
scale.y = c(0, 40, 5))

bar4

Unordered data can be difficult to interpret so we change the order of our data from highest to lowest values (decreasing).

# now we want to order the values decreasing
bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE,
title = "Miles per gallon across different car models",
scale.y = c(0, 40, 5),
decreasing = TRUE)

bar5

Finally, we can highlight a specific data point (bar) using the the bar.colour.name variable.

# finally we highlight a data point
bar_chart(mtcars, y = "mpg", x = "name", flip = TRUE,
title = "Miles per gallon across different car models",
scale.y = c(0, 40, 5),
decreasing = TRUE,
bar.colour.name = "Merc 280C")

bar6

We hope this have given you an idea of the powerful functions available in ggplotHelper.

Line plots

The most tricky part of ggplot is to make time series plots, but with ggplotHelper we aimed to make it very easy through the line_chart function. Line plots can either be time-series (often the case) or not. The line_chart function provides options for both and for time-series plots the function supports four different date/time classes. Those are POSIXct, Date, yearmon and yearqtr.

Let us start with creating two random processes.

set.seed(5)

# create two random processes
rand.ts <- data.frame(rp = c(cumprod(rnorm(500, 0.0004, 0.0016) + 1),
cumprod(rnorm(500, 0.0002, 0.0016) + 1)),
date = rep(seq.Date(Sys.Date() - 499, Sys.Date(), by = "days"), 2),
name = rep(c("rp1", "rp2"), each = 500))

# quick line plot
line_chart(rand.ts, y = "rp", x = "date", group = "name")

line1

In order to create a line plot with multiple lines the grouping variable has to be specified. However, the method for multiple lines will likely change in the future. As you can see in the data.frame object the date column contain duplicates. Ideally we want to get rid of this input design and allow for only unique date classes and multiple y columns.

The current version grouping names will be used as default legend names. The function automatically detects that the x variable is a Date class and enables certain options for manipulating dates (date format and scaling).

Now we will change the y-axis scale and change the legend names. We will also add a title.

# add title, changing the y-axis scale and add custome legend names
line_chart(rand.ts, y = "rp", x = "date",
group = "name", title = "Random processes",
legend.names = c("Random process 1", "Random process 2"),
y.min = .95, y.max = 1.25)

line2

As one quickly realizes, the two legend names stand next to each other. Can we add some space and make it prettier?

# add some extra space between legend names
line_chart(rand.ts, y = "rp", x = "date",
group = "name", title = "Random processes",
legend.names = c("Random process 1 ", "Random process 2"),
y.min = .95, y.max = 1.25)

line3

Extra space added. Sometimes you want a different date format than the ISO standard. This can easily be changed through the date.format variable and the x.interval setting sets the increment of the sequence in x values. See example below.

# change the date format and interval
line_chart(rand.ts, y = "rp", x = "date",
group = "name", title = "Random processes",
legend.names = c("Random process 1 ", "Random process 2"),
y.min = .95, y.max = 1.25, date.format = "%Y %b %d",
x.interval = 60)

line4

The date.format takes all the available conversion rules in the strptime function. Finally we want to show how to send arguments to the grey_theme function through the line_chart function.

# remove legend
line_chart(rand.ts, y = "rp", x = "date",
group = "name", title = "Random processes",
legend.names = c("Random process 1 ", "Random process 2"),
y.min = .95, y.max = 1.25, date.format = "%Y %b %d",
x.interval = 60, legend.position = "none")

line5

The legend.position variable is set in the grey_theme function but is passed on through the ellipsis. The only other available variable is plot.margin which provides the functionality to change the plot margins in centimeters.

The line_chart function also supports ribbon, vertical and horizontal lines (used to highlight a critical level in the data). Especially the ribbon is a more advanced function and currently works only with one time series.

Save plots

Often we want our plots saved outside our R environment. The ggplotHelper has a quick save_png function that takes the ggplot object names (plot names) as input and stores the plot objects as .png files with a height of 480 pixels and width scaled by the golden ratio to 777. A future feature will allow the user to specify the height and width.

line1 <- line_chart(rand.ts, y = "rp", x = "date", group = "name")

line3 <- line_chart(rand.ts, y = "rp", x = "date", group = "name",
title = "Random processes",
legend.names = c("Random process 1 ", "Random process 2"),
y.min = .95, y.max = 1.25)

# save the two ggplots as .png files
save_png(line4, line5)

It is that easy to save ggplots as .png files.

Other functions

The ggplotHelper package also supports density, box and scatter plots and more will likely be implemented in the future. Density plot is very powerful and is one of our favourite plots for showing results of bootstrapped trading strategies.

Advertisements

If you have something intelligent to add, please write a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s