A catalog of datasets for environmental economics
data
With the help of Keith Kirchner, one of our all-star graduate students, I’ve started to put together a catalog of datasets relevant to environmental economics research. I teach several classes where research design is a main emphasis, and many students have found it useful to have a list of datasets that are out there as they begin thinking about potential research ideas.
Making better tables
analysis
There’s not much that’s sexy about a table. Everyone loves a good figure, but you won’t find many people singing the praises of a particularly well-constructed table in an academic paper. And yet, tables are the most common medium through which academic authors summarize datasets and relay results.
Partial predictions
coding
In climate economics and in other settings, we often would like to estimate a response function, or the outcome as a function of some covariate, i.e., \(y = f(T)\). Most of the time, \(T\) stands for temperature. Figure 3 in Carleton and Hsiang (2016) documents a bunch of different response functions from the literature.
How I’m remote teaching a big class this fall
teaching
I start teaching remotely in two weeks. Helping 140 students spread across the world (so many timezones!) learn about environmental economics is a daunting task. Fortunately, I have great colleagues with good ideas about how to do it. I canvassed my network on Twitter yesterday, starting with what I had planned for the course:
Overlaying a raster and shapefile
R
I’m often overlaying rasters with shapefiles in order to get, for example, the average weather for Indonesia. I’ve found that it’s immensely important that I map my data when I’m doing this sort of thing, to make sure that I’m not making any boneheaded mistakes (e.g., using the wrong projection). Here’s an example of a map like that, where the color of the cells indicates whether or not we have data there, plus the code I used to create it.
Notes on sourdough
food
And now for something completely different… sourdough! I first began baking as an escape from grad school ennui. Since then, and especially in last couple months, it’s been fun to share a few of the tips and tricks I’ve picked up along the way with friends who are just getting into baking.
How to plot a specification curve
programming
Like many researchers, I often want to plot a range of coefficient estimates to figure out if the results I’m finding are robust to other sensible specification and functional form choices. This kind of estimate is called a sensitivity curve (Simonsohn, Simmons, and Nelson 2015), and I am far from the first to do it. In fact, there are even a couple packages available: Joachim Gassen’s
rdfanalysis
and Philipp Masur’s specr
(I haven’t used either, yet).
Scoring texts for the presence of phrases
programming
In my text analysis work, I frequently score texts for the presence or absence of various ``keywords’’. Because I work with some large corpora (collections of texts), for example the billions of tweets in my job market paper, this can be a time-consuming task. I have previously done most of this in Python, but right now I’m also interested in doing it quickly in R for ad hoc analyses.
Making regressions purrr
programming
I often need to run multiple sets of regressions on the same or similar datasets. This is usually for some set of robustness checks, either to help me better understand the stability of some result or to respond to referee requests. For years, this has been a mostly ad-hoc process for me. First, I would just copy-paste my regressions, modifying one variable or filter on the dataset with each paste. When this got to be too manual, I turned to nested loops and/or
apply
functions. This was an improvement in terms of running the regressions in a more systematic way, but extracting results I wanted to look at or highlight easily wasn’t straightforward. However, the purrr
package (part of the tidyverse) provides tools that can make all of this easier.
Things I forget: install git lfs and initialize in repo
programming
git lfs
is great for including (fairly) large files in git repositories. Since the entire history of files is saved, it prevents large files from blowing up the repo. I’m not sure why it isn’t installed by default with git. Anyway, I always forget how to use it.
Things I forget: readr shortcuts
programming
readr
is a the swiss-army knife of data ingestion: it’s my tool of choice for reading in text data in R, not least because I’m spending more time using the tidyverse these days. The readr
documentation is a little lacking in that it’s actually kind of hard to track down the single-character shortcuts for various column types. Without further ado:
Debugging in R, RStudio
programming
Debugging can be a challenge in RStudio. One of my main frustrations is that once you execute the Run command on a selection of code (i.e., running it in interactive mode), it will send all the commands you selected, even if one or more of them raises an error. This often results in me running a full script, even after an error occurs on one of the first few lines of code. In most cases, nothing remaining in the script can run successfully without whatever errored out earlier. For example, suppose I’m executing the following code:
Climate Projection Sandbox
econometrics
Many climate-society papers project the impacts of predicted climate change on the outcome of interest (guilty!). This post include code to conduct this kind of “climate projection exercise”. The idea is to combine a dose-response function \(f(T)\) (i.e., a damage function) with an estimate of the projected shift in the distribution of climate \(\Delta g(T)\). Specifically, damages are: \[ \int^T f(t) \Delta g(t) dt \]
Estimating high dimensional fixed effects models in Julia
programming
Update (December 2022): Julia has changed quite a bit since I originally wrote this.
FixedEffectsModels.jl
still existed when I last checked in 2018, but it often gave me errors and I used it less and less. As of now, I run regressions (and do the vast, vast majority of my work) using fixest
in R.
Conley standard errors and high dimensional fixed effects
econometrics
Update (April 2022): I’ve left this post up for posterity, but, full disclosure, the “solution” I offer below the break isn’t much of a solution at all. These days, when I have a project with this kind of potential issue, I usually cluster the standard errors by a large administrative unit (e.g., state) as a way of ensuring that inference isn’t driven by spatial correlation. This is very likely less efficient than a spatial clustering approach, however.
Productivity and organization notes
productivity
December 2022: I’ve left this post up for posterity, but I habitually reinvent my work process every six months or soo, so at this point these notes are about well out of date. My philosophy around “productivity” continues to evolve, as does the set of tools I use.
No matching items