Making regressions purrr

2019/06/11

I often need to run multiple sets of regressions on the same or similar datasets. This is usually for some set of robustness checks, either to help me better understand the stability of some result or to respond to referee requests. For years, this has been a mostly ad-hoc process for me. First, I would just copy-paste my regressions, modifying one variable or filter on the dataset with each paste. When this got to be too manual, I turned to nested loops and/or apply functions. This was an improvement in terms of running the regressions in a more systematic way, but extracting results I wanted to look at or highlight easily wasn’t straightforward. However, the purrr package (part of the tidyverse) provides tools that can make all of this easier.

The following code, along with a couple functions I’ve added to baylisR (call it a vanity package), allows me to facilitate a few common tasks:

  1. I can easily build a set of regressions I want to run by combining different possible variables and datasets.
  2. The output can be saved compactly as a tibble with a list-column containing either a stripped-down version of the fitted felm object, or a tidied data.frame of the same.
  3. I can easily select specific statistical specifications for display in tables or figures.

There’s a bit more beneath the hood of this code. You’ll want to refer to the code of reg_felm to see how to call it. If you’re going to implement this yourself, I recommend you don’t rely on baylisR but instead extract the code for reg_felm (and strip_felm, which it calls) and modify it for your own purposes.

library(baylisR) # reg_felm function
library(data.table)
library(tidyverse)

# Load data
data("Wages", package="plm")
setDT(Wages)

reg_tibble <- as_tibble(
  expand.grid(lhs = "lwage", rhs = c("exp", "exp + wks"), 
              fe = "married", clus = "0", data_str = c("Wages", "Wages[bluecol == 'no']"),
              stringsAsFactors = F))

reg_output <- cbind(reg_tibble, purrr::pmap_dfr(reg_tibble, reg_felm))

# Can also save regression output as a tidied data.frame
reg_output_tidy <- cbind(reg_tibble, purrr::pmap_dfr(reg_tibble, reg_felm, tidy = T))

# These can be used directly in stargazer or huxtable...
stargazer::stargazer(reg_output$fit)
## 
## % Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
## % Date and time: Tue, Jun 11, 2019 - 10:54:27
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\hspace{5pt}}l@{\hspace{5pt}}cccc} 
## \toprule 
##  & \multicolumn{4}{c}{\textit{Dependent variable:}} \\ 
## \cmidrule(rr){2-5} 
##  & \multicolumn{4}{c}{fmla} \\ 
##  \cmidrule(rr){2-5}
##  & (1) & (2) & (3) & (4)\\ 
## \midrule  
## \\[-2.1ex] exp & 0.007$^{***}$ & 0.007$^{***}$ & 0.012$^{***}$ & 0.012$^{***}$ \\ 
##   & (0.001) & (0.001) & (0.001) & (0.001) \\ 
##  \addlinespace 
##  wks &  & 0.004$^{***}$ &  & 0.008$^{***}$ \\ 
##   &  & (0.001) &  & (0.002) \\ 
##  \addlinespace 
## \midrule  
## Observations & 4,165 & 4,165 & 2,036 & 2,036 \\ 
## R$^{2}$ & 1.000 & 1.000 & 1.000 & 1.000 \\ 
## Adjusted R$^{2}$ & 1.000 & 1.000 & 1.000 & 1.000 \\ 
## Residual Std. Error & 0.000 (df = 4162) & 0.000 (df = 4161) & 0.000 (df = 2033) & 0.000 (df = 2032) \\ 
## \bottomrule 
## \textit{Note:}  & \multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
## \end{tabular} 
## \end{table}
huxtable::huxreg(reg_output$fit)
(1) (2) (3) (4)
exp 0.007 *** 0.007 *** 0.012 *** 0.012 ***
(0.001)    (0.001)    (0.001)    (0.001)   
wks          0.004 **           0.008 ***
         (0.001)             (0.002)   
N 0         0         0         0        
R2 1.000     1.000     1.000     1.000    
*** p < 0.001; ** p < 0.01; * p < 0.05.
# ...Or shown all together in a single table.
bind_rows(reg_output_tidy$fit_tidy, .id = "reg_id")
reg_id term estimate std.error statistic p.value
1 exp 0.00705 0.000624 11.3  3.32e-29
2 exp 0.00714 0.000623 11.5  6.39e-30
2 wks 0.00433 0.00132  3.29 0.00102 
3 exp 0.0116  0.000889 13.1  1.26e-37
4 exp 0.0117  0.000886 13.2  3.71e-38
4 wks 0.0079  0.00195  4.05 5.26e-05
# It's also straightforward to display only a subset of the regressions.
huxtable::huxreg(reg_output %>% filter(rhs == "exp + wks") %>% pull(fit))
(1) (2)
exp 0.007 *** 0.012 ***
(0.001)    (0.001)   
wks 0.004 **  0.008 ***
(0.001)    (0.002)   
N 0         0        
R2 1.000     1.000    
*** p < 0.001; ** p < 0.01; * p < 0.05.