TL;DR

To make data crunching a bit easier you can install packages that other people have developed that contain functions. You will most likely use the same common packages for your work but first you need to know how to install them into RStudio. By the end of this post you will know how to install basic package and which ones are the most useful.

Introduction

R is such a flexible and powerful tool that can practically complete any action required to import, process and analyse most data types. If you are proficient in R, these actions can be written yourself as code, but there is a vibrant community of developers that have pre-created functions and data sets that you can take advantage of.

Throughout these posts we will be providing you with the necessary code that you can replicate the actions within RStudio. We will provide code within the posts and also in one big chunk at the end which will be annotated to make it easier to copy into RStudio. Hopefully this will make more sense when you start using R more. Now lets get started!

Installing packages

These packages are developed and uploaded onto the CRAN or the ‘The Comprehensive R Archive Network’ and can be called upon within RStudio. To do this, you only need to enter the code below. In this example, we will install the ggplot2 package.

install.packages("ggplot2")
#The package name e.g. install.packages("tidyverse")

If you forget the above code then you can install packages using the packages window and by pressing ‘install’. However it is much faster and more helpful for others that use your code to know which packages are required. Due to this reason, it is usually the first thing that is included within your R Script.

Packages are usually developed to conduct a set of discrete actions and therefore you may need multiple packages to complete a certain task. If you ever need to do this then you can use the following code to install multiple packages at the same time. Here we will install ggplot2 and dplyr.

install.packages(c("ggplot2", "dplyr"))
#The c indicates you are starting a list

The packages may take a while to load but you will know they are finished when the console returns to the > symbol.

Loading packages

Once installed, they then need to be loaded to be able to use the functions within the packages. To do this, you use the following code:

library("ggplot2")
#e.g. library("tidyverse")

The important bit to remember is that you only have to install the package once but you have to load the package into RStudio every time you would like to use it. However, you may want to periodically reinstall packages in order to receive the latest updates as they are released onto the CRAN.

To load multiple packages at once, it is a little more complicated than installing packages but to complete this you can use the following code:

lapply(c("ggplot2", "dplyr"), library, character.only = TRUE)
#lapply applies the function library over the previous list
#character.only defines whether the package can be assumed to be character strings

Useful packages

At the time of writing there are over 18,000 packages currently available on the CRAN. This makes it very difficult to know which packages to start with, but you will find that you use a small number of packages on a regular basis.

Throughout the tutorials we will be listing which packages to use for certain tasks but RStudio actually has a good list of packages on their website:

https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages

But from this list, we would recommend the following:

Manipulating data

dplyr - for creating subset datasets, joining and rearranging datasets, and for summarising data.

lubridate - to work with dates and times.

tidyr - to change the format of datasets.

tidyverse - actually a collection of packages put together into a package collection including (ggplot2, dplyr, tidyr, readr, purr, tibble, stringr and forcats).

To visualise data

ggplot2 - to make all kinds of plots from basic scatter plots to spatial maps integrating data from the google mapping api.

ggpubr - for creating publication ready plots including arranging multiple plots on the same page.

Working with time series data

zoo - for working with time series data

GGIR - for processing data collected by accelerometers.

iglu - for processing data collected by glucose monitors.

The more you use R to analyse your own data, the more you will come to learn which packages are most useful to you. However, if you work with unique data or datasets, then you may not find a single package or even a package that exists that can help you with your data. In this case, it can be a combination of packages of even you may want to write your own after you have managed to understand how to process your data.

Conclusion

Packages are a key component of the R infrastructure and allow us to benefit from useful code snippets written by other developers. They allow our code to be highly reproducible whilst preventing us from having to solve the same common problems as many others have solved before.

Complete Code

install.packages("ggplot2")
#This code can be used to install any specified package e.g. install.packages("tidyverse")

install.packages(c("ggplot2", "dplyr"))
#Use this code to install more than one package at a time.
#The c indicates you are starting a list

library("ggplot2")
#how to load a package e.g. library("tidyverse")

lapply(c("ggplot2", "dplyr"), library, character.only = TRUE)
#How to load multiple packages at once
#lapply applies the function library over the previous list
#character.only defines whether the package can be assumed to be character strings