Conley standard errors and high dimensional fixed effects

Patrick Baylis 2015-10-12 3 minute read

Update (April 2022): I’ve left this post up for posterity, but, full disclosure, the “solution” I offer below the break isn’t much of a solution at all. These days, when I have a project with this kind of potential issue, I usually cluster the standard errors by a large administrative unit (e.g., state) as a way of ensuring that inference isn’t driven by spatial correlation. This is very likely less efficient than a spatial clustering approach, however.

So how would I do this now, in 2022? It turns out that the endlessly impressive R package fixest (developed by Laurent Berge) now has a routine (search for “conley”) for it. So that would be my first stop. If I were still a Stata user, I would look into the acreg command.

If anyone has an alternative, better method for computing Conley SEs, please do let me know. And if you still want to check out the old post, it’s below the break.


Original blog post (2015): For my JMP, I cluster my standard errors two ways, across both geographies and time. During a recent seminar, one of the audience members asked me why I wasn’t using spatial standard errors instead, for example those described in Conley (2008).

A case where this might matter is as follows: suppose I’m worried about correlated variation in both my left- and right-hand side variables between observations that are near each other (putting aside correlation across time for now since the concept is equivalent). One typical solution, equivalent to what I was using, is to cluster at some geographic level, say by county. If the correlations only occur within each county, then this is sufficient. If, however, observations across county lines are correlated (e.g. Kansas City, then the standard errors I estimate may be too small. Conley standard errors solve this problem. In fact, one of my advisers, Sol Hsiang, implements these errors for both Matlab and Stata. I also found a version for R, though I haven’t tested it.

However, I have a lot of data and multiple dimensions of fixed effects, so I am using Sergio Correia’s fantastic reghdfe Stata package. He is planning to implement Conley standard errors, but hasn’t gotten around to it yet. Thiemo Fetzer implements a solution using both Sol’s code, but it uses reg2hdfe (which is similar, but generally slower than reghdfe) and looks complicated.

Instead, I use hdfe, which does the demeaning for reghdfe, to partial out my fixed effects. Then I run Sol’s code on the demeaned data. I’ve posted the code without context below, to give an example:

hdfe afinn_score_std tmean_b*, clear absorb(gridNum statemonth) \\\
    tol(0.001) keepvars(gridNum date lat lng)

ols_spatial_HAC afinn_score_std tmean_b*, lat(lat) lon(lng) \\\
    time(date) panel(gridNum) distcutoff(16) lagcutoff(7) disp

This appears to be much too slow for my dataset (>2 million obs at different locations, more than a year of daily data). I found an updated version in R from Thiemo Fetzer again, but this too isn’t fast enough for my needs, even though the matrix operations are implemented in C++. For now, the best solution is to estimate these errors on a random subset of the data.