Fast Zonal Statistics – Patrick Baylis

Warning

Update (July 2020): There are now other R options that are faster. For example exactextractr, velox, and fasterize, among many others. I no longer resort to Python for these tasks.

Taking the average value of rasters by polygon is slow in R. Really slow. Using extract with a raster of more than 800k cells (PRISM weather data) and a shapefile of more than 3000 polygons (counties) takes a little over 500 seconds.

The rasterstats package in Python offers a faster solution, but can be tricky to install, particularly in Windows:

Install Anaconda Python (trying to upgrade pip broke it, so I just reinstalled Ananconda)
Install gdal 1.11.4 (not 2.0), fiona, rasterio, and Shapely
- May need to install some from binaries if using Windows
Install rasterstats

Sample code to run zonal statistics and save the result to CSV.

import csv
from rasterstats import zonal_stats

# GLOBALS
RAST = 'E:/prism/data/zipped/PRISM_tmin_stable_4kmD1_20150531_bil.bil'
POLY = 'E:/prism/data/census/gz_2010_us_050_00_20m/gz_2010_us_050_00_20m.shp'

# Run zonal_stats
stats = zonal_stats(POLY, RAST, stats=['count', 'mean', 'min', 'max'], all_touched=True, geojson_out=True)

# Save to CSV
keys = stats[0]['properties'].keys()
with open('E:/prism/out.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    for row in stats:
        dict_writer.writerow(dict((k, v.encode('utf-8')) if isinstance(v, basestring) else (k,v) for k, v in row['properties'].iteritems()))