library(tidyverse) # you know why ;-)
library(broom) # for tidying model output
library(glue) # for pasting character values
library(sjPlot) # for creating tables of models
Using R in your text
Loading penguins data
library(palmerpenguins)
data("penguins")
Some information about the data
The penguins data contains information of 344 penguins. The number of penguins is ‘calculated’ by R in the text. Hence, if you would go back to the islands and measure some additional penguins. You can just rerun the script and the count would adjust itself…
The penguins live on three islands. We want to know how many penguins were observed on each island. So, we apply some of our tidyverse skills.
## Count by group
<- penguins %>%
islands group_by(island) %>%
count()
islands
# A tibble: 3 × 2
# Groups: island [3]
island n
<fct> <int>
1 Biscoe 168
2 Dream 124
3 Torgersen 52
Now, we can refer to the values in this data set by indexing! For example, there live 168 penguins on Biscoe. This is easy in this case (because the data set is small), but becomes difficult with larger data sets and you can easily make a mistake.
To make youR life easier, the package printy
can be used. (More information can be found at https://www.tjmahr.com/lists-knitr-secret-weapon/)
# install.packages("remotes")
# remotes::install_github("tjmahr/printy")
library(printy)
## Create a list
<- super_split(islands, island)
islands_list
# Also possible
#islands <- penguins %>%
# group_by(island) %>%
# count() %>%
# super_split(island)
Doing this, we converted the data set to a list of values for each island.
islands_list
$Biscoe
# A tibble: 1 × 2
# Groups: island [1]
island n
<fct> <int>
1 Biscoe 168
$Dream
# A tibble: 1 × 2
# Groups: island [1]
island n
<fct> <int>
1 Dream 124
$Torgersen
# A tibble: 1 × 2
# Groups: island [1]
island n
<fct> <int>
1 Torgersen 52
Now, we can refer to the values by using $
and the name of the island. For example, the number of penguins on Torgersen can be found by using the following code: islands_list$Torgersen$n
. Reporting on the number of penguins on each island now become easy.
The penguins live on three islands: Biscoe (n = 168), Dream (n = 124), and Torgersen (n = 52).
Fitting a linear model
We fit a linear model with body_mass_g
as dependent variable and species
and flipper_length_mm
as independent variable
## Fit a linear model
<- lm(body_mass_g ~ flipper_length_mm + species, data = penguins) Model_lm
Here is a nice table with the estimates of the model we fitted.
tab_model(Model_lm)
body mass g | |||
Predictors | Estimates | CI | p |
(Intercept) | -4031.48 | -5180.51 – -2882.45 | <0.001 |
flipper length mm | 40.71 | 34.66 – 46.75 | <0.001 |
species [Chinstrap] | -206.51 | -320.07 – -92.95 | <0.001 |
species [Gentoo] | 266.81 | 79.43 – 454.19 | 0.005 |
Observations | 342 | ||
R2 / R2 adjusted | 0.783 / 0.781 |
If we discuss the results of this model we want to refer to the estimates and confidence intervals. To do so, we create a list that contains 4 elements (one for each estimate):
- Intercept
- flipper_length_mm
- speciesChinstrap
- speciesGentoo
<- tidy(Model_lm, conf.int = TRUE) %>%
list_lm ## Remove the brackets around the Intercept-term
mutate(term = ifelse(term == "(Intercept)", yes = "Intercept", no = term)) %>%
## Round the values to 2 decimals
mutate_at(vars(estimate, conf.low, conf.high), fmt_fix_digits, 2) %>%
## Create a new variable 'ci' by glueing the lower and upper limit
mutate(ci = glue("[{conf.low}", "{conf.high}]")) %>%
select(term, estimate, ci) %>%
## Apply the super_split function
super_split(term)
Now, we can refer to the estimate of the intercept by using list_lm$Intercept$estimate
and to the confidence interval by using list_lm$Intercept$ci
.
The intercept of the model is -4031.48 (95% CI = [-5180.51-2882.45])