Fast Zonal Statistics


Update (April 2018): R now has the xelox package which is much faster for this. Also, fasterize is a much faster rasterization method.

Taking the average value of rasters by polygon is slow in R. Really slow. Using extract with a raster of more than 800k cells (PRISM weather data) and a shapefile of more than 3000 polygons (counties) takes a little over 500 seconds.

The rasterstats package in Python offers a faster solution, but can be tricky to install, particularly in Windows:

Sample code to run zonal statistics and save the result to CSV.

import csv
from rasterstats import zonal_stats

RAST = 'E:/prism/data/zipped/PRISM_tmin_stable_4kmD1_20150531_bil.bil'
POLY = 'E:/prism/data/census/gz_2010_us_050_00_20m/gz_2010_us_050_00_20m.shp'

# Run zonal_stats
stats = zonal_stats(POLY, RAST, stats=['count', 'mean', 'min', 'max'], all_touched=True, geojson_out=True)

# Save to CSV
keys = stats[0]['properties'].keys()
with open('E:/prism/out.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    for row in stats:
        dict_writer.writerow(dict((k, v.encode('utf-8')) if isinstance(v, basestring) else (k,v) for k, v in row['properties'].iteritems()))