Visualize Scatterplots with Marginal Distribution using ggExtra
Marginal Distribution with ggplot2 and ggExtra
The data in this example is from the UN Statistics Division Sustainable Development Goal, Indicator 4.4.1.
Also check out the r-graph-gallery.com for inspiration.
Here’s the breakdown:
- Load Packages and Libraries
The key here is the ggExtra package.
install.packages('ggExtra')
library(ggExtra)
library(tidyverse)
- Create a basic scatter plot
The key here is using pivot_wider to give all Type of skill their own columns. We’ll then pick out specific columns (i.e., COPA, EMAIL, PCPR) to summarize, then plot on the x- and y- axes.
# Basic Scatter Plot (color cluster by Gender)
p <- data %>%
select(GeoAreaName, TimePeriod, Sex, `Type of skill`, Value) %>%
group_by(GeoAreaName, TimePeriod, Sex, `Type of skill`, Value) %>%
pivot_wider(names_from = `Type of skill`, values_from = Value) %>%
mutate(
COPA = as.numeric(COPA),
EMAIL = as.numeric(EMAIL),
PCPR = as.numeric(PCPR)
) %>%
# Group by GeoAreaName, across TimePeriod, Sex
group_by(GeoAreaName, Sex) %>%
summarize(
avg_COPA = mean(COPA, na.rm = TRUE),
avg_EMAIL = mean(EMAIL, na.rm = TRUE),
avg_PCPR = mean(PCPR, na.rm = TRUE)
) %>%
ungroup() %>%
ggplot(aes(x = avg_PCPR, y = avg_EMAIL, color = Sex)) +
geom_point()
- Use
ggMarginal()to create the marginal distribution along the side of the scatter plots. This is a function from theggExtrapackage.
# Scatter Plot with Marginal Distribution
ggMarginal(p, type = 'histogram')
This particular chart is especially useful to highlight different distributions.