TL;DR
Even though creating visualizations in R can be achieved with often only 2 lines of code, the initial output is not that visually appealing. In this post we will cover how to style your plots and provide a template of code which should make beautiful plots in R a bit easier. We will also cover how to make panel plots and how to export your files to comply with journal dpi requirements.
Introduction
For most, graphing in R can be a bit daunting because of course, why would you spend your time to learn something that can easily be completed in excel in a few clicks and far fewer error messages! Well for me, the true benefit of using R for data exploration, where you can set up your graphing code for one plot, and then reuse is for other purposes. Therefore whilst making a plot in excel is far quicker, many of you will know the frustration of when you want to use the same theme for a different set of data.
We have covered the basics of plotting in the previous post, but now we wanted to share the basic set of code which will allow you to create publication ready plots, which you can reuse time and time again. Combing plots is also quite a useful way to reduce the amount of figures, which is often brought about by trying to fit more data (you worked hard on it and why shouldn’t it be included!) into journal/submission guidelines and so learning how to produce panel plots is quite useful.
Finally, we will be cross posting some of the same code within the exporting items from R post to show you how to make sure your figures conform to journal submission requirements. No more… your figures are not high enough resolution comments!
ggplot2 code
We will assume you are already familiar with our data visualisations in R post and so the first thing we will do is to graph some data using the health_stack dataset. For this example, we will plot the relationship between height and grip strength. If you haven’t already saved the data set on your computer, then you can download it here: HStack_data.csv.
<- read.csv("./static/files/HStack_data.csv", header = TRUE)
db #this is the dataset we will be using within this tutorial
#you will need to change the file path to point to the data file on your machine
library(ggplot2)
#we require the ggplot2 package for this post - remember to install it using install.packages("ggplot2")
#the next two lines of code define which data to use, what the x and y axes are and then what type of data to plot (scatter = geom_point)
ggplot(data = db, aes(x = height, y = grip_strength)) +
geom_point()
As we said earlier, the plot can be generated with only two lines of code but it is far from ready to be submitted for publication. Putting aside the somewhat sparse data set that we are using, there are many options we can change to tidy up the plot.
#creating a scatter chart
ggplot(data = db, aes(x = height, y = grip_strength)) + #the main plot code
geom_point(shape = 16, #sets the shape type
size = 2, #sets the size of the circles
colour = "#2e6f8e") + #sets the circle colour to the hex colour FFBF00
geom_smooth(method = lm, #adds a linear line to the plot
colour = "#000000", #colours the linear line
se=FALSE) + #adds or removes the se for the linear line
labs(title = "A plot of height versus grip strength", #adds a title to the plot
y = "Grip strength (kg)", #adds a y axis title
x = "Height (m)") + #adds a x axis title
ylim(0,120) + #sets the y axis limit
xlim(1.5,1.9) #sets the x axis limit
Hopefully you agree that the plot looks much better than the base plot in ggplot2 but there are a few more styling options that we can tweak to make it fully finished.
#reusing some of the code from above
<- ggplot(data = db, aes(x = height, y = grip_strength)) +
p1 geom_point(shape = 16, size = 2, colour = "#29af7f") +
geom_smooth(method = lm, colour = "#000000", se=FALSE) +
labs(title = "A plot of height versus grip strength", #changes the plot title
y = "Grip strength (kg)", #changes the y axis title
x = "Height (m)") + #changes the x axis title
scale_y_continuous(limits = c(0, 120), breaks = seq(0, 120, 20)) + #sets the y axis limit and tick sequence
scale_x_continuous(limits = c(1.5, 1.9), breaks = c(1.5,1.6,1.7,1.8,1.9)) + #sets the x axis limit and tick sequence
coord_cartesian(expand = c(0, 0)) + #plots add space on the edges of the axes and to remove them you can use this code
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank()) #removes the plot grey background
#we have assigned the ggplot to the object p1 to use later on in the tutorial and calling p1 displays the plot within the viewer p1
The last two lines of code are preference but the output is clean and understandable. The ggplot2 package is very customisable and one post alone won’t do it justice… there are literally books written just about the package. But by using the above code, you will be able to start to produce plots that can be used for a variety of purposes, included submitted articles.
You will notice in the above example we use the theme() command to set a number of aesthetic options for the plot. There are approximately 100 options within the theme function which can be found here. If there is anything you would like to change aesthetically this is a good place to start!
In the previous example we have only covered the scatter plot, but we will now show you how to create two other types of figures, a bar chart and a line graph. The first chart will aim to visually investigate if the average grip strength is different between those younger than 65 and those 65 or above. There is a little bit of code that is required to make a new grouping variable, but this can always be completed before the graphing process.
#creating a bar chart
#db <- read.csv("HStack_data_bar.csv", header = TRUE)
#remove the # if you haven't already loaded the data
#this is the dataset we will be using within this tutorial
#you will need to change the file path to point to the data file on your machine
library(dplyr)
#required to use the pipes and to calculate group averages
<- db %>% #assigns the results of the code to the dataframe db_bar
db_bar mutate(Above65 = ifelse(age >= 65 ,1,0)) %>% #calculate obese binary variable
group_by(Above65) %>% #group by the sex variable
summarise(grip_mean = mean(grip_strength), grip_sd = sd(grip_strength)) %>% #calculate the mean of the SBP variable
ungroup() #ungroup the variables (always wise to otherwise can causes errors)
<- ggplot(data = db_bar, aes(x = as.factor(Above65), y = grip_mean, #the main plot code and the as.factor tells r to treat Above65 as a grouping variable
p2 fill = as.factor(Above65))) + #tells r to fill per the grouping variable
geom_bar(stat = "identity", #sets the type of bar chart to use the values in the data
width = 0.8) + #changes the fill colour
labs(title = "A plot of age versus grip strength", #changes the plot title
y = "Grip strength (kg)", #changes the y axis title
x = "Age category (1>=65years of age)") + #changes the x axis title
scale_y_continuous(limits = c(0, 140), #sets the y axis limit
breaks = seq(0, 140, 20), #sets the y axis tick intervals
expand = c(0,0)) + #ensures the bars start from the x axis instead of floating
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank(), #removes the plot grey background
legend.title = element_blank()) + #removes the legend title
scale_fill_manual(labels = c("Below 65", "Above 65"), #changes the grouping labels for the legend
values = c("0" = "#FDE725FF",
"1" = "#440154FF")) + #manually changes the colours of the bars
geom_errorbar(aes(x=as.factor(Above65), ymin=grip_mean-grip_sd, ymax=grip_mean+grip_sd), width=0.2, colour="black", alpha=1, size=0.5) #adds error bars by using the summary variables from the previous summarise() calculation
#we have assigned the ggplot to the object p2 to use later on in the tutorial and calling p1 displays the plot within the viewer p2
The next example we wanted to show you is a line chart which can be useful for time series analysis or showing changes to a group over time. For this example, we will be using data from the iglu package which stores example glucose monitoring data as examples. This topic is covered in greater detail in the working with glucose data post if you are interested in this type of data.
In this example, we will plot a line chart using time series data and then again the same code to update the formatting.
#creating a line chart
library(iglu)
library(ggplot2)
#load the necessary packages for the example - remember to install the package if you don't already have it using install.packages("iglu")
<- example_data_1_subject %>% #assigning the example data 1 to the object db_line
db_line filter(time >= as.Date("2015-06-14"), time < as.Date("2015-06-16")) #filter to 2 days worth of data
ggplot(aes(x = time, y = gl), data = db_line) + #the main plot code
geom_line(size = 0.75, #changes the line size
colour = "#29af7f") + #changes the line colour
labs(title = "Glucose values over two days", #changes the plot title
y = "Glucose (mg/dl)", #changes the y axis title
x = "Date & time") + #changes the x axis title
scale_y_continuous(limits = c(0, 260), breaks = seq(0, 260, 20)) + #sets the y axis limit and tick sequence
scale_x_datetime(date_breaks = "3 hour", date_labels = ("%d/%m - %H:%M"), expand=c(0,0)) + #sets the x axis labels to every 3 hours and then formats the text to day/month hour:minute
coord_cartesian(expand = c(0, 0)) + #plots add space on the edges of the axes and to remove them you can use this code
geom_hline(yintercept = 70) + #adds a horizontal line for time in range y = 70 mg/dl
geom_hline(yintercept = 180) + #adds a horizontal line for time in range y = 180 mg/dl
theme(axis.text.x = element_text(angle = 40, vjust = 1.0, hjust = 1.0)) + #rotates the x axis labels for readability
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank()) #removes the plot grey background
We have shown you how to generate a few graphing types using some simple data. Graphing may be a little tricky with your own data but use our examples and the annotated code to help you.
Other aspects of visualization you should take into account are:
- Shape type here: you can alter the shape type of the data point within ggplot2. Here is a good link to describe how: https://ggplot2.tidyverse.org/reference/aes_linetype_size_shape.html
- You can choose whatever colours you would like for your plot and you can choose them using this handy tool: https://www.color-hex.com/
- When choosing colours, be mindful of journal requirements i.e. can you publish in colour and if not, are the colours distinguishable in greyscale? Also, be mindful of choosing colours that won’t be hard for individuals with colour blindness to interpret.
- We have included most of the aspects you may want to use when plotting but you can rearrange or delete or add as you see fit.
Panel plots
Sometimes representing plots separately can be an inefficient use of space or figure quotas within your journal article. For this purpose, panel plots may be useful and we will quickly show you how to do this now, using some of the code from above. If you haven’t already completed the first parts of the tutorials, you need to complete these first before we continue.
#creating a panel plot (using p1 + p2 from above - ensure these are obejects in your environment)
library(ggpubr)
#we require the ggpubr package for this post - remember to install it using install.packages("ggpubr")
<- p1 + labs(title = "A") + #changes the title from the plot to A for the panel plot
p1 theme(plot.title = element_text(hjust = 0, #ensures the title is left aligned
vjust = 3)) #raises the title a little away from the x yaxis
<- p2 + labs(title = "B") + #changes the title from the plot to A for the panel plot
p2 theme(plot.title = element_text(hjust = 0, #ensures the title is left aligned
vjust = 3)) #raises the title a little away from the x yaxis
<- ggarrange(p1 + p2, #insert all of the plots you would like to include in the panel plot
panel nrow = 1) #sets the number of rows you would like, you can also use ncol = 2 to set the number of coloumns for the plot
#calls the completed panel plot panel
Using a small bit of code saves you the hassle or arranging your plots in Word/PowerPoint and then having to group the added A and B labels. There are other options you can include using ggpubr and you can see them here: https://rpkgs.datanovia.com/ggpubr/reference/ggarrange.html
export and dpi
The final aspect of this tutorial is to show you how to export your plots to be used elsewhere. This task is covered in a future post exporting items from R post but we will quickly include the code below so you can complete the tutorial.
#exporting the panel plot
ggsave("panel_plot.jpeg", #the file name needs to be stated after the opening bracket and you can choose to save as a eps, pdf, jpeg, png etc by changing the suffix
plot = panel, #this command saves the last plot by default but if you have assigned the plot to an object, you can specify this here
width = 40, #width, height and units specifies the size of the plot
height = 20,
units = "cm",
dpi = 600) #dpi sets the plot resolution
#you file will save to wherever your working directory is and this should have been set by starting a new project
Conclusion
This post has run through how to create a few common graph types within RStudio. We have tried to include all of the options you may want to tweak so you can reuse the code straight away for your own data. Like everything in R, there are extensions to what we have just described and we have included a few links in the above text. But if there is there is something you would like us to cover then please let us know via the contact page.
Complete code
<- read.csv("HStack_data.csv", header = TRUE)
db #this is the dataset we will be using within this tutorial
#you will need to change the file path to point to the data file on your machine
library(ggplot2)
#we require the ggplot2 package for this post - remember to install it using install.packages("ggplot2")
#the next two lines of code define which data to use, what the x and y axes are and then what type of data to plot (scatter = geom_point)
ggplot(data = db, aes(x = height, y = grip_strength)) +
geom_point()
#creating a scatter chart
ggplot(data = db, aes(x = height, y = grip_strength)) + #the main plot code
geom_point(shape = 16, #sets the shape type
size = 2, #sets the size of the circles
colour = "#2e6f8e") + #sets the circle colour to the hex colour FFBF00
geom_smooth(method = lm, #adds a linear line to the plot
colour = "#000000", #colours the linear line
se=FALSE) + #adds or removes the se for the linear line
labs(title = "A plot of height versus grip strength", #adds a title to the plot
y = "Grip strength (kg)", #adds a y axis title
x = "Height (m)") + #adds a x axis title
ylim(0,120) + #sets the y axis limit
xlim(1.5,1.9) #sets the x axis limit
#reusing some of the code from above
<- ggplot(data = db, aes(x = height, y = grip_strength)) +
p1 geom_point(shape = 16, size = 2, colour = "#29af7f") +
geom_smooth(method = lm, colour = "#000000", se=FALSE) +
labs(title = "A plot of height versus grip strength", #changes the plot title
y = "Grip strength (kg)", #changes the y axis title
x = "Height (m)") + #changes the x axis title
scale_y_continuous(limits = c(0, 120), breaks = seq(0, 120, 20)) + #sets the y axis limit and tick sequence
scale_x_continuous(limits = c(1.5, 1.9), breaks = c(1.5,1.6,1.7,1.8,1.9)) + #sets the x axis limit and tick sequence
coord_cartesian(expand = c(0, 0)) + #plots add space on the edges of the axes and to remove them you can use this code
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank()) #removes the plot grey background
#we have assigned the ggplot to the object p1 to use later on in the tutorial and calling p1 displays the plot within the viewer
p1
#creating a bar chart
#db <- read.csv("HStack_data_bar.csv", header = TRUE)
#remove the # if you haven't already loaded the data
#this is the dataset we will be using within this tutorial
#you will need to change the file path to point to the data file on your machine
library(dplyr)
#required to use the pipes and to calculate group averages
<- db %>% #assigns the results of the code to the dataframe db_bar
db_bar mutate(Above65 = ifelse(age >= 65 ,1,0)) %>% #calculate obese binary variable
group_by(Above65) %>% #group by the sex variable
summarise(grip_mean = mean(grip_strength), grip_sd = sd(grip_strength)) %>% #calculate the mean of the SBP variable
ungroup() #ungroup the variables (always wise to otherwise can causes errors)
<- ggplot(data = db_bar, aes(x = as.factor(Above65), y = grip_mean, #the main plot code and the as.factor tells r to treat Above65 as a grouping variable
p2 fill = as.factor(Above65))) + #tells r to fill per the grouping variable
geom_bar(stat = "identity", #sets the type of bar chart to use the values in the data
width = 0.8) + #changes the fill colour
labs(title = "A plot of age versus grip strength", #changes the plot title
y = "Grip strength (kg)", #changes the y axis title
x = "Age category (1>=65years of age)") + #changes the x axis title
scale_y_continuous(limits = c(0, 140), #sets the y axis limit
breaks = seq(0, 140, 20), #sets the y axis tick intervals
expand = c(0,0)) + #ensures the bars start from the x axis instead of floating
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank(), #removes the plot grey background
legend.title = element_blank()) + #removes the legend title
scale_fill_manual(labels = c("Below 65", "Above 65"), #changes the grouping labels for the legend
values = c("0" = "#FDE725FF",
"1" = "#440154FF")) + #manually changes the colours of the bars
geom_errorbar(aes(x=as.factor(Above65), ymin=grip_mean-grip_sd, ymax=grip_mean+grip_sd), width=0.2, colour="black", alpha=1, size=0.5) #adds error bars by using the summary variables from the previous summarise() calculation
#we have assigned the ggplot to the object p2 to use later on in the tutorial and calling p1 displays the plot within the viewer
p2
#creating a line chart
library(iglu)
library(ggplot2)
#load the necessary packages for the example - remember to install the package if you don't already have it using install.packages("iglu")
<- example_data_1_subject %>% #assigning the example data 1 to the object db_line
db_line filter(time >= as.Date("2015-06-14"), time < as.Date("2015-06-16")) #filter to 2 days worth of data
ggplot(aes(x = time, y = gl), data = db_line) + #the main plot code
geom_line(size = 0.75, #changes the line size
colour = "#29af7f") + #changes the line colour
labs(title = "Glucose values over two days", #changes the plot title
y = "Glucose (mg/dl)", #changes the y axis title
x = "Date & time") + #changes the x axis title
scale_y_continuous(limits = c(0, 260), breaks = seq(0, 260, 20)) + #sets the y axis limit and tick sequence
scale_x_datetime(date_breaks = "3 hour", date_labels = ("%d/%m - %H:%M"), expand=c(0,0)) + #sets the x axis labels to every 3 hours and then formats the text to day/month hour:minute
coord_cartesian(expand = c(0, 0)) + #plots add space on the edges of the axes and to remove them you can use this code
geom_hline(yintercept = 70) + #adds a horizontal line for time in range y = 70 mg/dl
geom_hline(yintercept = 180) + #adds a horizontal line for time in range y = 180 mg/dl
theme(axis.text.x = element_text(angle = 40, vjust = 1.0, hjust = 1.0)) + #rotates the x axis labels for readability
theme(plot.title = element_text(hjust = 0.5), #justifies the plot title to the centre
axis.line = element_line(), #adds axis lines
panel.background = element_blank()) #removes the plot grey background
#creating a panel plot (using p1 + p2 from above - ensure these are obejects in your environment)
library(ggpubr)
#we require the ggpubr package for this post - remember to install it using install.packages("ggpubr")
<- p1 + labs(title = "A") + #changes the title from the plot to A for the panel plot
p1 theme(plot.title = element_text(hjust = 0, #ensures the title is left aligned
vjust = 3)) #raises the title a little away from the x yaxis
<- p2 + labs(title = "B") + #changes the title from the plot to A for the panel plot
p2 theme(plot.title = element_text(hjust = 0, #ensures the title is left aligned
vjust = 3)) #raises the title a little away from the x yaxis
<- ggarrange(p1 + p2, #insert all of the plots you would like to include in the panel plot
panel nrow = 1) #sets the number of rows you would like, you can also use ncol = 2 to set the number of coloumns for the plot
#calls the completed panel plot
panel
#exporting the panel plot
ggsave("panel_plot.jpeg", #the file name needs to be stated after the opening bracket and you can choose to save as a eps, pdf, jpeg, png etc by changing the suffix
plot = panel, #this command saves the last plot by default but if you have assigned the plot to an object, you can specify this here
width = 40, #width, height and units specifies the size of the plot
height = 20,
units = "cm",
dpi = 600) #dpi sets the plot resolution
#you file will save to wherever your working directory is and this should have been set by starting a new project