Ggplot2 is an r library that allows us to easily make publication quality plots. It is based on the “Grammar of Graphics” which describes a vocabulary for creating plots from data.
Start by loading up the GGPLOT2 library.
Ggplot works by translating your data, a layer at a time, into a visualisation. What do we mean by layers? All data (providing it is in a “tidy” form) can be represented by the following layers:
Lets look at how we would build these up for a worked example. Let’s use the data we’ve been using thus far in the course. The breast RCT data is already loaded into our workspace from before. We’ve loaded it in as the object
rct_data. Let’s take a look at it to remind outselves.
> head(rct_data) # A tibble: 6 x 21 Pt_No Age Gender Radnomisation PS_Prior_op PS_3hrs PS_12_16hrs PS_24_28hrs PS_2wks <int> <int> <int> <int> <int> <int> <int> <int> <int> 1 1 80 1 1 0 0 0 2 2 2 2 72 1 1 1 2 2 1 2 3 3 72 1 1 0 1 0 0 0 4 4 55 1 1 0 0 0 0 1 5 5 84 1 2 0 0 2 3 NA 6 6 72 1 2 1 2 2 2 NA # ... with 12 more variables: Mov_12_16hrs <int>, Mov_24_28hrs <int>, Mov_2wks <int>, # Paracetamol_g <int>, Ibuprofen_mg <int>, Diclofenac_mg <int>, Oromorph_mg <dbl>, # Amitryptaline_mg <dbl>, Pregabalin_mg <int>, Days_in_hosp <int>, # Reason_hosp_stay <int>, Overall_satisfaction <int>
Before doing anything, you need to make sure you have:
To build the “data” layer we use the following syntax:
ggplot(data = rct_data)
And this is what is returned:
This blank image is exactly what we are looking for. The data has been successfully loaded into ggplot, but it has no idea yet how to display that information. We need to move onto the “Aesthetic” layer.
The aesthetic layer takes a column of our data and maps it to a particular visual dimention. This could include:
There are others, but these are the basics. Let’s add the aesthetic mappings to our data. We’ll choose the variables
Days_in_hosp to start. Both are continuous numerical data, so lend themselves well to being plotted on a scatter graph, with one mapped to each axis.
ggplot(data = rct_data, mapping = aes(x = Age, y = Days_in_hosp))
Perfect, things are starting to take shape. The data has been mapped to a visual dimention. In this case, the x and y axis of the plot. We still aren’t seeing any data points as we need to add our last mandatory layer, the geom layer.
The geom layer gives final instructions on how we want the data to be displayed. Do we want points, crosses or something else entirely. Let’s add the geom layer:
ggplot(data = rct_data, mapping = aes(x = Age, y = Days_in_hosp)) + geom_point()
So we finally have a plot. It doesn’t look like there’s a very clear association between the two variables.
Below, there are a few more examples where we have mapped different aesthetics and used different geoms.
Let’s start by adding a shape for the randomisation:
ggplot(data = rct_data, mapping = aes(x = Age, y = Days_in_hosp, shape = randomisation)) + geom_point()
Ggplot has created a legend for us. Lets go the whole hog and add in overall satisfaction as a color mapping.
ggplot(data = rct_data, mapping = aes(x = Age, y = Days_in_hosp, shape = randomisation, color = Overall_satisfaction)) + geom_point()
Do you see how quite quickly, we’ve produced a series of publication quality graphics, essentially using 1 line of code.
What about comparing pain scores?
ggplot(data = rct_data, mapping = aes(x = randomisation, y = PS_3hrs)) + geom_boxplot()
Here we mapped the randomisation to the x axis as a categorical variable, which necessitated the use of the
geom_boxplot geom to display the data.
geom_jitter()does and why?