R and RStudio (ICO - Utrecht

Outline

Handling datasets is a skill that is central to many research areas. Managing, analysing and displaying data are challenging tasks that require learning specific software. In this workshop, you will be introduced to R and RStudio, one of the most versatile language and software environment for data analysis. The fact that this software is open-source makes it one of the most used statistical programming languages in scientific research at the moment. The workshop will teach you ‘hands-on’ to take your first steps in R and RStudio and prepare you to work with them in the professional research context.

Description of session contents

The training consists of four parts:

R: You get to know the program R and its language by means of a number of examples. You practice with data management and datasets: reading data from other software (MS Excel, SPSS, …), changing/re-coding variables and other data management steps using the intuitive ‘tidyverse’ method. This will be all approached by making use of the power of creating RStudio projects.
Graphs: You will learn to use R as a powerful tool to create different graphs via the package ‘ggplot2’.
R Markdown/Quarto: You will learn to create reproducible documents (such as html reports, word docs and slides) so that you can integrate content (the explanations of the analyses and results) analyses (the way in which the results are produced (read: the underlying code) as well as the results themselves, nicely and easily, into one overall document.
Basic analyses: finally, you will learn how to perform some of the most common analyses in R (correlations; t-test; linear regression analyses).

In the course we will schedule some time to work with your own research data (or with a dataset of us) so that you can immediately transfer what is learned to your owns projects.

Rooms

When?
Tuesday July 2 (10 am - 17:00 pm)
Wednesday July 3 (10 am - 17:00 pm)
Thursday July 4 (10 am - 17:00 pm)

Vergadercentrum Vredenburg, Vredenburg 19, Utrecht

Slides

If you like to have all the slides as a pdf you can download them here as zip-file:

Slides As Pdf

Outline

In the first part, we will introduce the basics of R language and the RStudio environment. Following topics will be discussed:

What is R and why should you use R?
Install R & RStudio
RStudio interface
The basics in R
Installing and using packages
Importing data
Working with R-projects

Materials

Slides

The htlm-version of the slides for this first part can be found here

R script

There is an R-script (Fruit.R) that has all the code we used in this first part. Download it here (Right-click and Save as)

References and resources

Outline

The second part of this workshop taps into the use of Quarto documents! Using Quarto you will be able to integrate your analyses (R-code & results) in a document (pdf, html, word, ppt, …), unleashing the power to create reproducible lab-reports and slideshows.

These will be topics covered:

What is Markdown?
Integrating Markdown and R code
Creating a basic Quarto document

Slides

The htlm-version of the slides for this part can be found here

Data

For this part, we will use a very small dataset Fruit.RData. The data can be downloaded here (right-click to save as).

Example application

Quarto allows you to create nice documents like lab-notes or walk-throughs. For instance, I have created this document as an appendix of a paper. This is made with Quarto and is an html page that lives on the internet:

Walk-through example 1

Walk-through example 2

Outline

In this part of the workshop we introduce dplyr verbs. dplyr is a package that lives in the tidyverse that has powerfull functions to do some heavy lifting data wrangling operations. We will show how this package can be used to create new variables, select variables, filter cases, re-arrange datasets, merge datasets, …

Slides

The htlm-version of the slides for this first part can be found here

Data

For this slideshow, we used a dataset called Friends.RData. The data can be downloaded here (right-click to save as).

References and resources

On-line interactive course on using dplyr can be found here:

https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome

Outline

Now everything is set to learn more about creating powerfull visualizations in R. To this end, we will introduce you the go-to package for visualizations in R: ggplot2.

These are the topics covered in this part:

Simple plots in R
Grammar of graphics
How ggplot2 works (in a nutshell)
Visualising a categorical variable
Visualising a quantitative variable
Visualising more than one variable
More about visualisation?

Slides

The htlm-version of the slides for this first part can be found here

Data

In this slideshow we make use of the Pinguins dataset that is part of the palmerpenguins package. So, the package palmerpenguins has to be installed and loaded to mimic the commands in the slides.

References and resources

Outline

Of course, R is not just a tool to do some data-wrangling and data-visualizations. It is a language to perform statistical analyses. In this final part we will show how some typical basic statistical analyses can be performed making use of R.

These are the topics covered in this part:

Correlation
t-test
linear regression
outro

Slides

The htlm-version of the slides for this first part can be found here

References and resources

Info

During the workshop we have some hands-on exercises. Here you can find a link to the datasets, the Quarto files with the instructions (right-click and save as), was well as Quarto files that present solutions for the exercises.

Datasets for the exercises

Exercise 1: Create a project and import data

The instructions for this exercise:

The solutions for this exercise:

Exercise 2: Data Wrangling with dplyr

The instructions for this exercise:

The solutions for this exercise:

Exercise 3: Visualizations with ggplot2

The instructions for this exercise:

The solutions for this exercise:

Exercise 4: Integrated exercise

The instructions for this exercise:

The solutions for this exercise:

References and resources

The dataset Friends.sav, used in some of the exercises, is a simulated dataset based on the following study:

Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.” Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.

Now that you know R and Quarto, we can create lab-reports that even integrate R-code within the text itself. For instance, if you want to integrate the results of the calculation of a mean in your text itself it is possible. We created a document that showcases these functionalities.

Materials on visualisations

Great(est) blog-site about ggplot2: https://www.cedricscherer.com/

Tine created an on-line tutorial on ggplot2 (in Dutch): https://datavisualisatie-met-ggplot2.netlify.app/

Site with all the cheatsheets

In need for the cheatsheets? You can find some here: https://rstudio.github.io/cheatsheets/

Some free online books

Great book by one of the founders of the tidyverse, Hadley Wickham. R for Data Science (2e): https://r4ds.hadley.nz/

Fundamentals of Data Visualization by Claus Wilke: https://clauswilke.com/dataviz/index.html

About Quarto

Quarto can be used for much more than just writing single documents. More info on all possibilities of Quarto can be found here: https://quarto.org/docs/guide/

If you create a file (or a website) that you want to publish on internet there are many ways to do so. More information here: https://quarto.org/docs/publishing/

You can even create a personal (academic) blog website with Quarto. Great tutorial can be found here: https://www.marvinschmitt.com/blog/website-tutorial-quarto/

Sven’s website

When I find the time I write short posts about (Bayesian) statistics. Some of the topics touched upon in the workshop are also documented in these posts, where I try to explain certain concepts or workflows. These posts also contain the necessary code!

https://sdemaeyer.quarto.pub/posts.html

Nice communities on R

The R Ladies community: https://rladies.org/

Tidy Tuesday challenges on YouTube: https://www.youtube.com/playlist?list=PL19ev-r1GBwkuyiwnxoHTRC8TTqP8OEi8