Handling datasets is a skill that is central to many research areas. Managing, analysing and displaying data are challenging tasks that require learning specific software. In this workshop, you will be introduced to R and RStudio, one of the most versatile language and software environment for data analysis. The fact that this software is open-source makes it one of the most used statistical programming languages in scientific research at the moment. The workshop will teach you ‘hands-on’ to take your first steps in R and RStudio and prepare you to work with them in the professional research context.
Description of session contents
The training consists of four parts:
In the course we will schedule some time to work with your own research data (or with a dataset of us) so that you can immediately transfer what is learned to your owns projects.
When? |
---|
Tuesday July 2 (10 am - 17:00 pm) |
Wednesday July 3 (10 am - 17:00 pm) |
Thursday July 4 (10 am - 17:00 pm) |
Vergadercentrum Vredenburg, Vredenburg 19, Utrecht
If you like to have all the slides as a pdf you can download them here as zip-file:
In the first part, we will introduce the basics of R
language and the RStudio
environment. Following topics will
be discussed:
The second part of this workshop taps into the use of
Quarto
documents! Using Quarto
you will be
able to integrate your analyses (R-code & results) in a document
(pdf, html, word, ppt, …), unleashing the power to create reproducible
lab-reports and slideshows.
These will be topics covered:
The htlm-version of the slides for this part can be found here
For this part, we will use a very small dataset
Fruit.RData
. The data can be downloaded
here (right-click
to save as).
Quarto allows you to create nice documents like lab-notes or walk-throughs. For instance, I have created this document as an appendix of a paper. This is made with Quarto and is an html page that lives on the internet:
In this part of the workshop we introduce dplyr
verbs.
dplyr
is a package that lives in the tidyverse
that has powerfull functions to do some heavy lifting data wrangling
operations. We will show how this package can be used to create new
variables, select variables, filter cases, re-arrange datasets, merge
datasets, …
The htlm-version of the slides for this first part can be found here
For this slideshow, we used a dataset called
Friends.RData
. The data can be downloaded
here
(right-click to save as).
On-line interactive course on using dplyr
can be found
here:
https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome
Now everything is set to learn more about creating powerfull
visualizations in R
. To this end, we will introduce you the
go-to package for visualizations in R
:
ggplot2
.
These are the topics covered in this part:
ggplot2
works (in a nutshell)The htlm-version of the slides for this first part can be found here
In this slideshow we make use of the Pinguins
dataset
that is part of the palmerpenguins
package. So, the package
palmerpenguins
has to be installed and loaded to mimic the
commands in the slides.
Of course, R
is not just a tool to do some
data-wrangling and data-visualizations. It is a language to perform
statistical analyses. In this final part we will show how some typical
basic statistical analyses can be performed making use of
R
.
These are the topics covered in this part:
The htlm-version of the slides for this first part can be found here
During the workshop we have some hands-on exercises. Here you can find a link to the datasets, the Quarto files with the instructions (right-click and save as), was well as Quarto files that present solutions for the exercises.
The instructions for this exercise:
The solutions for this exercise:
dplyr
The instructions for this exercise:
The solutions for this exercise:
ggplot2
The instructions for this exercise:
The solutions for this exercise:
The instructions for this exercise:
The solutions for this exercise:
The dataset Friends.sav, used in some of the exercises, is a simulated dataset based on the following study:
Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.” Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.
Now that you know R
and Quarto
, we can
create lab-reports that even integrate R-code
within the
text itself. For instance, if you want to integrate the results of the
calculation of a mean in your text itself it is possible. We created a
document that showcases these functionalities.
Great(est) blog-site about ggplot2
: https://www.cedricscherer.com/
Tine created an on-line tutorial on ggplot2
(in Dutch):
https://datavisualisatie-met-ggplot2.netlify.app/
In need for the cheatsheets? You can find some here: https://rstudio.github.io/cheatsheets/
Great book by one of the founders of the
tidyverse
, Hadley Wickham. R for Data Science
(2e): https://r4ds.hadley.nz/
Fundamentals of Data Visualization by Claus Wilke: https://clauswilke.com/dataviz/index.html
Quarto can be used for much more than just writing single documents. More info on all possibilities of Quarto can be found here: https://quarto.org/docs/guide/
If you create a file (or a website) that you want to publish on internet there are many ways to do so. More information here: https://quarto.org/docs/publishing/
You can even create a personal (academic) blog website with Quarto. Great tutorial can be found here: https://www.marvinschmitt.com/blog/website-tutorial-quarto/
When I find the time I write short posts about (Bayesian) statistics. Some of the topics touched upon in the workshop are also documented in these posts, where I try to explain certain concepts or workflows. These posts also contain the necessary code!
R
The R Ladies community: https://rladies.org/
Tidy Tuesday challenges on YouTube: https://www.youtube.com/playlist?list=PL19ev-r1GBwkuyiwnxoHTRC8TTqP8OEi8