TL;DR

Starting using R can be confusing. By the end of this post you should understand the common terms used and also how data objects are referred to within R.

Introduction

Before you get started reading any tutorials about R (including ours!) you should familiarise yourself with what R is and some of the most common data structures you will encounter.

What is R?

R is both a language and an open source software for processing data, conducting statistics and for graphing data.

Unlike software such as SPSS or STATA, R doesn’t require a user licence to be purchased and can be installed for free on Windows and MacOS machines.

The software is developed by the R Core Team and the R Foundation for Statistical computing. You can download the latest copy of the software from here (please also see our install page for more info).

You can use R through something called a command line console within the R software, but most users now utilise something called an Integrated Development Environment (IDE) that can help make it easier for new coders to learn the language.

Much of what you would like to accomplish within R can be done with the base language and by writing the code yourself. However, pre-built functions and code are written by the R community to help clean, process and analyse your data.

Terminology

With any new language and software, there are many abbreviations and terms that we wish we knew of when first starting out. These will become second nature after working in R, but we have also inserted a glossary below that hopefully will be useful:

  • CRAN (The Comprehensive R Archive Network): A network of servers that store the R software.
  • Vignette: The guide of a particular package.
  • Console: The console is where you can type and run code in real time.
  • Script: A script is where you can write code, store it and build complex functions without running the code. These are also where you can store your analyses and rerun them for later.
  • Terminal: The terminal provides you access to your computer’s operating system. We won’t be using this within our tutorials.
  • Functions: A function is a set of statements or code that is organised together to perform a task.
  • Pipes: These are tools for connecting multiple operations in R. These are represented by the symbols ‘%>%’ and we will be covering how to use these in another post.

The above is not an exhaustive list and there will be more terms that we introduce throughout the tutorials. But the above should get your going.

Variable types

Like all statistical software, there are a few variable types that you will need to be familiar with:

  • Numeric (num): Any numeric values with decimals.
  • Integer (int): Any number that is a whole number
  • Character (chr): Any variable that includes characters such as alphabets or special characters.
  • Complex (cplx): This data type stores numbers with an imaginary component (e.g 2i).
  • Logical (logi): A data type that has a value of either true or false.

Mostly, we will be encountering num, int and chr data types throughout our posts but its useful to know what the others mean when you start to use R more.

Data structures

The final aspects of R that you need to know before you start is how data is organised within R and RStudio. If you are used to Excel, it can be confusing at first how data is used, stored and referred to within R. But it does get easier the more you use it.

  • Vector: A one-dimensional data set (same type i.e. numeric or character).
  • Matrix: A two-dimensional data set (same type).
  • Array: A multi-dimensional data set (same type).
  • Object: Consists of one or more vectors.
  • Dataframe: A data structure where columns store different data. These mostly resemble a spreadsheet and can consist of different types of data. This is the type of data structure we will be using the most (and looks the most like Excel!).
  • Lists: A collection of of objects and like the dataframe, can be composed of different types of data.

We have inserted a visual representation of the above terms to hopefully make them a bit clearer.

Credit: Ceballos & Cardiel (2003)

Conclusion

There is a lot to learn when you first open R. But with time and experience, you won’t look back. I still forget what everything means which is why we started this site… so we can look it up when we forget :).