Outline

Handling datasets is a skill that is central to many research areas. Managing, analysing and displaying data are challenging tasks that require learning specific software. In this workshop, you will be introduced to R and RStudio, one of the most versatile language and software environment for data analysis. The fact that this software is open-source makes it one of the most used statistical programming languages in scientific research at the moment. The workshop will teach you ‘hands-on’ to take your first steps in R and RStudio and prepare you to work with them in the professional research context.

Description of session contents

The training consists of four parts:

  • R: You get to know the program R and its language by means of a number of examples. You practice with data management and datasets: reading data from other software (MS Excel, SPSS, …), changing/re-coding variables and other data management steps using the intuitive ‘tidyverse’ method. This will be all approached by making use of the power of creating RStudio projects.
  • Graphs: You will learn to use R as a powerful tool to create different graphs via the package ‘ggplot2’.
  • R Markdown/Quarto: You will learn to create reproducible documents (such as html reports, word docs and slides) so that you can integrate content (the explanations of the analyses and results) analyses (the way in which the results are produced (read: the underlying code) as well as the results themselves, nicely and easily, into one overall document.
  • Basic analyses: finally, you will learn how to perform some of the most common analyses in R (correlations; t-test; linear regression analyses).

In the course we will schedule some time to work with your own research data (or with a dataset of us) so that you can immediately transfer what is learned to your owns projects.

Rooms
When?
Tuesday July 2 (10 am - 17:00 pm)
Wednesday July 3 (10 am - 17:00 pm)
Thursday July 4 (10 am - 17:00 pm)

Vergadercentrum Vredenburg, Vredenburg 19, Utrecht

Slides

If you like to have all the slides as a pdf you can download them here as zip-file:

Slides As Pdf

Outline

In the first part, we will introduce the basics of R language and the RStudio environment. Following topics will be discussed:

  • What is R and why should you use R?
  • Install R & RStudio
  • RStudio interface
  • The basics in R
  • Installing and using packages
  • Importing data
  • Working with R-projects
Materials

Slides

The htlm-version of the slides for this first part can be found here

R script

There is an R-script (Fruit.R) that has all the code we used in this first part. Download it here (Right-click and Save as)

References and resources
Outline

The second part of this workshop taps into the use of Quarto documents! Using Quarto you will be able to integrate your analyses (R-code & results) in a document (pdf, html, word, ppt, …), unleashing the power to create reproducible lab-reports and slideshows.

These will be topics covered:

  • What is Markdown?
  • Integrating Markdown and R code
  • Creating a basic Quarto document

Slides

The htlm-version of the slides for this part can be found here

Data

For this part, we will use a very small dataset Fruit.RData. The data can be downloaded here (right-click to save as).

Example application

Quarto allows you to create nice documents like lab-notes or walk-throughs. For instance, I have created this document as an appendix of a paper. This is made with Quarto and is an html page that lives on the internet:

Walk-through example 1

Walk-through example 2

Outline

In this part of the workshop we introduce dplyr verbs. dplyr is a package that lives in the tidyverse that has powerfull functions to do some heavy lifting data wrangling operations. We will show how this package can be used to create new variables, select variables, filter cases, re-arrange datasets, merge datasets, …

Slides

The htlm-version of the slides for this first part can be found here

Data

For this slideshow, we used a dataset called Friends.RData. The data can be downloaded here (right-click to save as).

References and resources

On-line interactive course on using dplyr can be found here:

https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome

Outline

Now everything is set to learn more about creating powerfull visualizations in R. To this end, we will introduce you the go-to package for visualizations in R: ggplot2.

These are the topics covered in this part:

  • Simple plots in R
  • Grammar of graphics
  • How ggplot2 works (in a nutshell)
  • Visualising a categorical variable
  • Visualising a quantitative variable
  • Visualising more than one variable
  • More about visualisation?

Slides

The htlm-version of the slides for this first part can be found here

Data

In this slideshow we make use of the Pinguins dataset that is part of the palmerpenguins package. So, the package palmerpenguins has to be installed and loaded to mimic the commands in the slides.

References and resources
Outline

Of course, R is not just a tool to do some data-wrangling and data-visualizations. It is a language to perform statistical analyses. In this final part we will show how some typical basic statistical analyses can be performed making use of R.

These are the topics covered in this part:

  • Correlation
  • t-test
  • linear regression
  • outro

Slides

The htlm-version of the slides for this first part can be found here

References and resources
Info

During the workshop we have some hands-on exercises. Here you can find a link to the datasets, the Quarto files with the instructions (right-click and save as), was well as Quarto files that present solutions for the exercises.

Exercise 1: Create a project and import data

The instructions for this exercise:

The solutions for this exercise:

Exercise 2: Data Wrangling with dplyr

The instructions for this exercise:

The solutions for this exercise:

Exercise 3: Visualizations with ggplot2

The instructions for this exercise:

The solutions for this exercise:

Exercise 4: Integrated exercise

The instructions for this exercise:

The solutions for this exercise:

References and resources

The dataset Friends.sav, used in some of the exercises, is a simulated dataset based on the following study:

Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.” Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.

Now that you know R and Quarto, we can create lab-reports that even integrate R-code within the text itself. For instance, if you want to integrate the results of the calculation of a mean in your text itself it is possible. We created a document that showcases these functionalities.