Part 4
Powerful visualisations with ggplot2
2nd - 4th July, 2024
ggplot2
works (in a nutshell) --- (click here)base
The generic function plot()
"knows" what to do (plot) with the input it receives.
# one quantitatieve variableplot(Friends$fluency)
# one qualitative variablesplot(table(Friends$condition))
boxplot( Friends$fluency, main = "Boxplot of the variable 'fluency'", col = "steelblue" )
hist( Friends$fluency, main = "Histogram of the variable 'fluency'", freq = FALSE, ylim = c(0, .08) )
Theoretical 'breakdown' of a visualisation into components (called layers)
One system to create different visualisations
At the heart of several modern graphical applications:
Slide taken from slide show by Thomas Lin Pedersen
Layers of a visualisation:
data
aesthetics
geometries
facets
statistics
coordinates
themes
Animation by Thomas de Beus
Start by setting up the foundation with ggplot()
Specify ingredients (variables) with aes()
and a flavour with scales
Create layers to plot with geoms
Style the cake with theme
Slide by Tanya Shapiro
group | country | gender | mean_age | SD_age | CI_lower | CI_upper |
---|---|---|---|---|---|---|
BE Male | BE | Male | 39 | 11.0 | 17.44 | 60.56 |
BE Female | BE | Female | 41 | 13.2 | 15.13 | 66.87 |
BE Other | BE | Other | 36 | 8.2 | 19.93 | 52.07 |
NL Male | NL | Male | 37 | 12.0 | 13.48 | 60.52 |
NL Female | NL | Female | 36 | 14.0 | 8.56 | 63.44 |
NL Other | NL | Other | 31 | 7.2 | 16.89 | 45.11 |
Describe how variables in data are mapped to visual properties. For example:
Change appearances of aesthetics using scales
Also called 'small multiples'
Define how much panels are shown and how they are arranged
Data might be tidy
, but still in need of some statistical calculations. For example:
Often implicit done by ggplot2
ggplot2
works (in a nutshell)Nice data set that can be used within R (Source: https://allisonhorst.github.io/palmerpenguins/articles/intro.html)
install.packages("palmerpenguins")library(palmerpenguins)data("penguins")
Artwork by @allison_horst
Table: Table 1. Random sample of 10 observations from the Palmer Pinguins dataset
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Dream | 39.6 | 18.8 | 190 | 4600 | male | 2007 |
Gentoo | Biscoe | 51.1 | 16.5 | 225 | 5250 | male | 2009 |
Gentoo | Biscoe | 45.2 | 15.8 | 215 | 5300 | male | 2008 |
Gentoo | Biscoe | 47.5 | 15.0 | 218 | 4950 | female | 2009 |
Adelie | Dream | 36.0 | 17.1 | 187 | 3700 | female | 2009 |
Adelie | Dream | 36.0 | 17.8 | 195 | 3450 | female | 2009 |
Gentoo | Biscoe | 43.8 | 13.9 | 208 | 4300 | female | 2008 |
Adelie | Biscoe | 39.6 | 20.7 | 191 | 3900 | female | 2009 |
Adelie | Torgersen | 46.0 | 21.5 | 194 | 4200 | male | 2007 |
Adelie | Dream | 36.8 | 18.5 | 193 | 3500 | female | 2009 |
ggplot2
: data & aestheticsPlot <- ggplot( ## Step 1: data data = penguins, ## Step 2: specify aesthetics (mapping) aes( x = flipper_length_mm, y = body_mass_g))Plot
ggplot2
: geometryPlot <- ggplot( ## Step 1: data data = penguins, ## Step 2: specify aesthetics (mapping) aes( x = flipper_length_mm, y = body_mass_g)) + ## Step 3: add geometry geom_point()Plot
Not every component of the Grammar of Graphics needs to be defined. The other components have default values that are automatically applied.
ggplot2
: facetsPlot <- ggplot( ## Step 1: data data = penguins, ## Step 2: specify aesthetics (mapping) aes( x = flipper_length_mm, y = body_mass_g)) + ## Step 3: add geometry geom_point() + ## Step 4: define facets facet_wrap(~species)Plot
ggplot2
: themePlot <- ggplot( ## Step 1: data data = penguins, ## Step 2: specify aesthetics (mapping) aes( x = flipper_length_mm, y = body_mass_g)) + ## Step 3: add geometry geom_point() + ## Step 4: define facets facet_wrap(~species) + ## Step 5: set theme theme_minimal()Plot
geom_*
options..."The bar is open... Let's have a lollipop?"
There are many ways to visualise a categorical variable...
Can you think of some?
There are many ways to visualise a categorical variable...
Can you think of some?
Have a look at data to viz.com
geom_bar()
or geom_col()
geom_bar()
geom_col()
geom_bar()
or geom_col()
geom_bar()
geom_col()
ggplot( penguins, aes( x = species ) ) + geom_bar()# stat_count() does the counting automatically
count_data <- penguins %>% count(species, name = 'count')ggplot( count_data, aes( x = species, y = count ) ) + geom_col()
geom_bar()
fill
ggplot( penguins, aes( x = species ) ) + geom_bar( ## Additional aesthetic: "fill"-scale aes(fill = species) )
geom_bar()
fill
-colors by adding scale_fill_manual()
ggplot( penguins, aes( x = species ) ) + geom_bar( aes(fill = species) ) + ## Specify fill-colors scale_fill_manual( values = c("darkorange", "purple", "cyan4") )
geom_bar()
labs()
ggplot( penguins, aes( x = species ) ) + geom_bar( aes(fill = species) ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + ## Add title, subtitle and label x-axis labs( title = "Palmer penguins", subtitle = "n observations for species", x = "" )
geom_bar()
theme_minimal()
ggplot( penguins, aes( x = species ) ) + geom_bar( aes(fill = species) ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "n observations for species", x = "" ) + ## Choose theme theme_minimal()
geom_bar()
coord_flip()
theme(legend.position = "none")
ggplot( penguins, aes( x = species ) ) + geom_bar( aes(fill = species) ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "n observations for species", x = "" ) + ## Flip x- and y-axis coord_flip( ) + theme_minimal( ) + ## Remove legend theme( legend.position = "none" )
Use two geoms: geom_point()
and geom_segment()
penguins %>% count(species) %>% ggplot( aes( x = species, y = n) ) + geom_point( aes(col = species) ) + geom_segment( aes( x = species, xend = species, y = 0, yend = n, col = species ) ) +
Re-use remainder of code!
scale_colour_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "n observations for species", x = "" ) + coord_flip( ) + theme_minimal( ) + theme( legend.position = "none" )
fct_reorder()
penguins %>% count(species) %>% ## Create an ordered factor mutate( species_ord = fct_reorder(species,n) ) %>% ggplot( aes(x = species_ord, y = n)) + geom_point( aes(col = species) ) + geom_segment( aes( x = species, xend = species, y = 0, yend = n, col = species ) ) + scale_colour_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "n observations for species", x = "" ) + coord_flip( ) + theme_minimal( ) + theme( legend.position = "none" )
[ggplot2]
: part 1You can find the qmd-file Exercises_ggplot2.qmd
at the course website.
Download the qmd-file Exercises_ggplot2.qmd
to your laptop
Open the file in RStudio
The file contains a set of coding assignments with empty code blocks
Now, we focus on part 1 of the exercises
Write the code (and test it by running it)
Stuck? No Worries!
Exercises_ggplot2_solutions.qmd
) There are many ways to visualise a quantitative variable...
Can you think of some?
There are many ways to visualise a quantitative variable...
Can you think of some?
Taken from Fundamentals of Data Visualization by Claus Wilke
geom_histogram()
ggplot( penguins, aes( x = flipper_length_mm ) ) + geom_histogram()
geom_histogram()
scale_y_continuous()
ggplot( penguins, aes( x = flipper_length_mm ) ) + geom_histogram() + ## Specify breaks scale_y_continuous( breaks = c(seq(0, 25, 5), 29) )
geom_histogram()
alpha
panel.grid.minor = element_blank()
ggplot( penguins, aes( x = flipper_length_mm, fill = species ) ) + geom_histogram( ## Change transparency of points alpha = .7 ) + ## Specify breaks scale_y_continuous( breaks = c(seq(0, 25, 5), 29) ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "Histogram of flipper length", x = "Flipper length" ) + theme_minimal() + theme( ## Remove minor grid lines panel.grid.minor = element_blank() )
geom_density()
geom_histogram()
with geom_density()
plotggplot( penguins, aes( x = flipper_length_mm, fill = species ) ) + ## Use geom_density geom_density( alpha = .7 ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "Density plot of flipper length", x = "Flipper length" ) + theme_minimal()
°surface below curve = 100%
geom_density()
fill
with color
ggplot( penguins, aes( x = flipper_length_mm, color = species ) ) + geom_density() + ## Replace scale_color with scale_fill scale_color_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "Density plot of flipper length", x = "Flipper length" ) + theme_minimal() + theme( panel.grid.minor = element_blank() )
geom_violin()
+ geom_jitter()
ggplot( penguins, aes( x = species, y = flipper_length_mm, ## Aesthetics fill and color applied to EVERY geom fill = species, color = species ) ) + ## Use geom_violin and geom_jitter geom_violin( alpha = .65 ) + geom_jitter( alpha = .7 ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + scale_color_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "Density plot voor flipper lengte", y = "Flipper length", x = "", ) + theme_minimal() + theme( legend.position = "none" )
geom_violin()
+ geom_jitter()
position="top"
panel.grid.major = element_blank()
ggplot( penguins, aes( x = species, y = flipper_length_mm, fill = species, color = species ) ) + geom_violin( alpha = .65 ) + geom_jitter( alpha = .7 ) + scale_x_discrete( position = "top" ) + scale_fill_manual( values = c("darkorange","purple","cyan4") ) + scale_color_manual( values = c("darkorange","purple","cyan4") ) + labs( title = "Palmer penguins", subtitle = "Density plot of flipper length", y = "Flipper length", x = "", ) + theme_minimal() + theme( legend.position = "none", panel.grid.minor = element_blank(), ## Remove major grid lines panel.grid.major.x = element_blank() )
Additional package required: ggdist
library(ggdist)ggplot(penguins, aes(x = species, y = flipper_length_mm)) + stat_halfeye( adjust = .5, width = .6, .width = 0, justification = -.2, point_colour = NA ) + geom_boxplot( width = .15, outlier.shape = NA ) + geom_point( size = 1.3, alpha = .3, position = position_jitter( seed = 1, width = .1 )) + labs( title = "Palmer penguins", subtitle = "Distribution of flipper length by species", x = "", y = "Flipper length" ) + coord_cartesian(xlim = c(1.2, NA), clip = "off") + coord_flip() + theme_minimal()
[ggplot2]
: part 2You can find the qmd-file Exercises_ggplot2.qmd
at the course website.
Download this file to your laptop
Open the file in RStudio
The file contains a set of coding assignments with empty code blocks
Now, we focus on part 2 of the exercises
Write the code (and test it by running it)
Stuck? No Worries!
Exercises_ggplot2_solutions.qmd
) There are many ways to visualise more than one variable... The choice depends (among other things) on the type of variables you want to plot:
Can you think of an example of each?
There are many ways to visualise more than one variable... The choice depends (among other things) on the type of variables you want to plot:
Can you think of an example of each?
Have a look at data to viz.com
We focus today upon:
geom_point()
ggplot( penguins, aes( x = body_mass_g, y = flipper_length_mm ) ) + geom_point() + labs( title = "Palmer penguins", subtitle = "Relation of flipper length with body mass", y = "Flipper length", x = "Body mass", ) + theme_minimal() + theme( legend.position = "none" )
geom_point()
and geom_smooth()
geom_smooth()
ggplot( penguins, aes( x = body_mass_g, y = flipper_length_mm ) ) + geom_point() + ## Add a trendline geom_smooth() + labs( title = "Palmer penguins", subtitle = "Relation of flipper length with body mass", y = "Flipper length", x = "Body mass" ) + theme_minimal() + theme( legend.position = "none" )
geom_point()
and geom_smooth()
method = "lm"
se = FALSE
ggplot( penguins, aes( x = body_mass_g, y = flipper_length_mm ) ) + geom_point() + geom_smooth( ## Add linear trend line method = "lm", ## Remove confidence band se = FALSE ) + labs( title = "Palmer penguins", subtitle = "Correlation between flipper length and body mass", y = "Flipper length", x = "Body mass", ) + theme_minimal() + theme( legend.position = "none" )
geom_point()
and geom_smooth()
shape
and color
ggplot( penguins, aes( x = flipper_length_mm, y = body_mass_g ) ) + geom_point( aes( color = species, shape = species ), size = 3, alpha = 0.8 ) + geom_smooth( aes(color = species), se = F, method = "lm" ) + theme_minimal() + scale_color_manual( values = c("darkorange","purple","cyan4")) + labs( title = "Palmer penguins", subtitle = "Correlation between flipper length and body mass", y = "Flipper length", x = "Body mass", color = "Species", shape = "Species") + theme( legend.position = c(0.2, 0.7), legend.background = element_rect(fill = "white", color = NA), plot.title.position = "plot", plot.title = element_text(hjust = 0, face= "bold"), plot.subtitle = element_text(hjust = 0, face= "italic") )
geom_point()
and geom_smooth()
facet_wrap()
ggplot( penguins, aes( x = body_mass_g, y = flipper_length_mm ) ) + geom_point( ) + geom_smooth( method = "lm", se = FALSE ) + facet_wrap(~species) + labs( title = "Palmer penguins", subtitle = "Correlation between flipper length and body mass", y = "Flipper length", x = "Body mass", ) + theme_minimal() + theme( legend.position = "none", plot.title = element_text(hjust = 0, face= "bold"), plot.subtitle = element_text(hjust = 0, face= "italic") )
Grouped barplot of counts usingposition_dodge()
Stacked barplot of counts using argumentposition_stack()
Grouped barplot of counts usingposition_dodge()
Stacked barplot of counts using argumentposition_stack()
penguins %>% ggplot( aes( x = island, fill = species ) ) + geom_bar( position = position_dodge() ) + theme_minimal()
penguins %>% ggplot( aes( x = island, fill = species) ) + geom_bar( position = position_stack() ) + theme_minimal()
position_fill()
penguins %>% ggplot( aes( x = island, fill = species ) ) + geom_bar( position = position_fill() ) + theme_minimal()
geom_text()
scale_fill_brewer()
coord_cartesian(expand=FALSE)
penguins %>% ggplot( aes( x = island, fill = species ) ) + geom_bar( position = position_fill(), alpha = .9 ) + geom_text( aes(label = ..count..), stat = "count", colour = "white", position = position_fill(vjust = 0.5) ) + scale_fill_brewer( type = "qual", palette = 6 ) + scale_x_discrete(position = "top")+ scale_y_continuous(breaks = c(0.5, 1)) + labs( title = "Palmer penguins", subtitle = "Distribution of penguin species by island", fill = "Species" ) + coord_cartesian(expand = FALSE) + theme_minimal() + theme( plot.title = element_text(size = 20, face = "bold"), plot.subtitle = element_text(size = 16, face = "italic"), axis.title = element_blank(), axis.text.x = element_text(size = 14, face = "bold"), legend.title = element_text(face = "bold"), panel.grid.minor = element_blank(), panel.grid.major.x = element_blank() )
Colors are not meaningless and ... they grab attention
R has several built-in color palettes. The functions scale_fill_brewer
and scale_color_brewer
use palettes of RColorBrewer. These palettes come in three 'flavours':
[ggplot2]
: part 3You can find the qmd-file Exercises_ggplot2.qmd
at the course website.
Download this file to your laptop
Open the file in RStudio
The file contains a set of coding assignments with empty code blocks
Now, we focus on part 3 of the exercises
Write the code (and test it by running it)
Stuck? No Worries!
Exercises_ggplot2_solutions.qmd
) Don't forget RStudio's help-function!
Google is your best fRiend...
Just google your question and you'll find code, examples, ...
Generative AI can help as well!
library(ggdist)P1 <- ggplot( penguins, aes( x = species, y = flipper_length_mm ) ) + stat_halfeye() P1
P1 <- ggplot( penguins, aes( x = species, y = flipper_length_mm ) ) + stat_halfeye( adjust = .5, width = .6, .width = 0, justification = -.2, point_colour = NA ) P1
P2 <- P1 + geom_boxplot( width = .15, outlier.shape = NA ) P2
P3 <- P2 + geom_point( size = 1.3, alpha = .3, position = position_jitter( seed = 1, width = .1 ) ) P3
P4 <- P3 + labs( title = "Palmer penguins", subtitle = "Distribution of flipper length by species", x = "", y = "Flipper length" ) P4
P5 <- P4 + coord_cartesian(xlim = c(1.2, NA), clip = "off") + coord_flip() + theme_minimal() + theme( plot.title.position = "plot", plot.title = element_text(face = "bold"), plot.subtitle = element_text(face = "italic") )P5
ggplot2
works (in a nutshell) --- (click here)Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |