Making regressions purrr

Patrick Baylis 2019-06-11 4 minute read

I often need to run multiple sets of regressions on the same or similar datasets. This is usually for some set of robustness checks, either to help me better understand the stability of some result or to respond to referee requests. For years, this has been a mostly ad-hoc process for me. First, I would just copy-paste my regressions, modifying one variable or filter on the dataset with each paste. When this got to be too manual, I turned to nested loops and/or apply functions. This was an improvement in terms of running the regressions in a more systematic way, but extracting results I wanted to look at or highlight easily wasn’t straightforward. However, the purrr package (part of the tidyverse) provides tools that can make all of this easier.

The following code, along with a couple functions I’ve added to baylisR (call it a vanity package), allows me to facilitate a few common tasks:

  1. I can easily build a set of regressions I want to run by combining different possible variables and datasets.
  2. The output can be saved compactly as a tibble with a list-column containing either a stripped-down version of the fitted felm object, or a tidied data.frame of the same.
  3. I can easily select specific statistical specifications for display in tables or figures.

There’s a bit more beneath the hood of this code. You’ll want to refer to the code of reg_felm to see how to call it. If you’re going to implement this yourself, I recommend you don’t rely on baylisR but instead extract the code for reg_felm (and strip_felm, which it calls) and modify it for your own purposes.

library(tidyverse)
library(huxtable)
library(stargazer)

# Load data
data("Wages", package = "plm")

# Convenience function, fits model use lfe::felm following inputs
reg_felm <- function(lhs, rhs, fe = "0", iv = "0", clus = "0", data_str) {
    data <- eval(parse(text = data_str))
    fmla <- sprintf("%s ~ %s | %s | %s | %s", lhs, rhs, fe, iv, clus)
    fit <- lfe::felm(as.formula(fmla), data = data)
}


reg_tibble <- as_tibble(
    expand.grid(lhs = "lwage", rhs = c("exp", "exp + wks"), 
                fe = "married", clus = "0", data_str = c("Wages", "Wages %>% filter(bluecol == 'no')"),
                stringsAsFactors = F))

reg_tibble$fit <- purrr::pmap(reg_tibble, reg_felm)


# These can be used directly in stargazer or huxtable...
stargazer::stargazer(reg_tibble$fit)
## 
## % Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
## % Date and time: Mon, Jul 06, 2020 - 11:16:22
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lcccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
##  & \multicolumn{4}{c}{\textit{Dependent variable:}} \\ 
## \cline{2-5} 
## \\[-1.8ex] & \multicolumn{4}{c}{fmla} \\ 
## \\[-1.8ex] & (1) & (2) & (3) & (4)\\ 
## \hline \\[-1.8ex] 
##  exp & 0.007$^{***}$ & 0.007$^{***}$ & 0.012$^{***}$ & 0.012$^{***}$ \\ 
##   & (0.001) & (0.001) & (0.001) & (0.001) \\ 
##   & & & & \\ 
##  wks &  & 0.004$^{***}$ &  & 0.008$^{***}$ \\ 
##   &  & (0.001) &  & (0.002) \\ 
##   & & & & \\ 
## \hline \\[-1.8ex] 
## Observations & 4,165 & 4,165 & 2,036 & 2,036 \\ 
## R$^{2}$ & 0.110 & 0.112 & 0.160 & 0.166 \\ 
## Adjusted R$^{2}$ & 0.110 & 0.112 & 0.159 & 0.165 \\ 
## Residual Std. Error & 0.435 (df = 4162) & 0.435 (df = 4161) & 0.415 (df = 2033) & 0.414 (df = 2032) \\ 
## \hline 
## \hline \\[-1.8ex] 
## \textit{Note:}  & \multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
## \end{tabular} 
## \end{table}
huxtable::huxreg(reg_tibble$fit)
(1) (2) (3) (4)
exp 0.007 *** 0.007 *** 0.012 *** 0.012 ***
(0.001)    (0.001)    (0.001)    (0.001)   
wks          0.004 **           0.008 ***
         (0.001)             (0.002)   
N 4165         4165         2036         2036        
R2 0.110     0.112     0.160     0.166    
logLik -2446.104     -2440.700     -1098.465     -1090.272    
AIC 4900.208     4891.400     2204.930     2190.543    
*** p < 0.001; ** p < 0.01; * p < 0.05.
# ...Or shown all together in a single table, after tidying.
bind_rows(lapply(reg_tibble$fit, broom::tidy), .id = "reg_id")
reg_id term estimate std.error statistic p.value
1 exp 0.00705 0.000624 11.3  3.32e-29
2 exp 0.00714 0.000623 11.5  6.39e-30
2 wks 0.00433 0.00132  3.29 0.00102 
3 exp 0.0116  0.000889 13.1  1.26e-37
4 exp 0.0117  0.000886 13.2  3.71e-38
4 wks 0.0079  0.00195  4.05 5.26e-05
# It's also straightforward to display only a subset of the regressions.
huxtable::huxreg(reg_tibble %>% filter(rhs == "exp + wks") %>% pull(fit))
(1) (2)
exp 0.007 *** 0.012 ***
(0.001)    (0.001)   
wks 0.004 **  0.008 ***
(0.001)    (0.002)   
N 4165         2036        
R2 0.112     0.166    
logLik -2440.700     -1090.272    
AIC 4891.400     2190.543    
*** p < 0.001; ** p < 0.01; * p < 0.05.