How to plot a specification curve

programming
Author

Patrick Baylis

Published

February 28, 2020

Like many researchers, I often want to plot a range of coefficient estimates to figure out if the results I’m finding are robust to other sensible specification and functional form choices. This kind of estimate is called a sensitivity curve (Simonsohn, Simmons, and Nelson 2015), and I am far from the first to do it. In fact, there are even a couple packages available: Joachim Gassen’s rdfanalysis and Philipp Masur’s specr (I haven’t used either, yet).

I wanted to roll my own, though. Since my code is fairly simple (less than 60 lines including comments and generous spacing) and uses the tidyverse, it may be helpful to other people too. First, here’s the output. It’s certainly not the prettiest version of a specification curve, but it gets the job done.

This is my specification curve. There are many like it, but this one is mine.

Second, here’s the code (also available as a gist) that creates it.

# Create sensitivity curve of coefficient estimates
library(tidyverse)
library(cowplot)
library(fastDummies)

# Setup ----
rm(list = ls())
theme_set(theme_cowplot())
set.seed(42)

# Create some fake estimates to plot ----
# Required components: est, se, and any variables that describe specifications

# These are the possible specifications we "estimated"
fe <- c("Unit", "Time", "Unit + Time")
controls <- c("Basic", "Some", "Full")

estimates <- as_tibble(expand.grid(`Fixed Effects` = fe, `Controls` = controls))
estimates <- estimates %>% mutate(est = rnorm(n()), se = runif(n(), 0, 0.1))

# Create a plot of the estimates ----
spec_cols <- c("Fixed Effects", "Controls") # Could be set by user or figured out
# Note: This assumes the preferred ordering of the specification categories is the order in which they are given

estimates <- estimates %>% mutate(ci_l = est - 1.96 * se, ci_h = est + 1.96 * se)

estimates <- estimates %>% 
    arrange(est) %>% mutate(h_order = 1:n()) # Sort on point estimates for horizontal ordering

coef_plot <- ggplot(estimates, aes(x = h_order, y = est)) + 
    geom_linerange(aes(ymin = ci_l, ymax = ci_h), size = 1, alpha = 0.5) + 
    geom_point(fill = "white", shape = 21) + 
    labs(y = "Coefficient") + 
    theme(axis.title.x = element_blank(), axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank())
coef_plot

# Create a plot of the specifications ----

# Function to create a specification plot for a single category. 
make_spec_plot <- function(category) {
    # category = spec_cols[1] # DEBUG
    specs <- dummy_cols(estimates, select_columns = category, remove_selected_columns = T) %>%
        select(h_order, starts_with(category)) %>% 
        pivot_longer(starts_with(category), names_prefix = paste0(category, "_")) %>%
        mutate(name = factor(name, levels = rev(unique(name)))) 
    
    spec_plot <- ggplot(specs, aes(x = h_order, y = name, alpha = value)) +
        geom_point() + 
        scale_alpha_continuous(guide = FALSE) +
        theme(axis.title.x = element_blank(), axis.ticks.x = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank()) + 
        theme(axis.text.y = element_text(size = 6), axis.title.y = element_blank(), axis.ticks.y = element_blank(), axis.line.y = element_blank())
    spec_plot    
}
spec_plots <- lapply(spec_cols, make_spec_plot)

combined_plot <- plot_grid(plotlist = c(list(coef_plot), spec_plots), 
          labels = c("", spec_cols), label_size = 8, label_fontface = "italic", vjust = 0.5, hjust = -0.1,
          rel_heights = c(1, 0.2, 0.2), ncol = 1, align = "v")
save_plot("sensitivity-curve.png", combined_plot)