Fast Zonal Statistics
Update (July 2020: There are now other R options that are faster. For example
fasterize, among many others. I no longer resort to Python for these tasks.
Taking the average value of rasters by polygon is slow in R. Really slow. Using
extract with a raster of more than 800k cells (PRISM weather data) and a shapefile of more than 3000 polygons (counties) takes a little over 500 seconds.
rasterstats package in Python offers a faster solution, but can be tricky to install, particularly in Windows:
Install Anaconda Python (trying to upgrade
pipbroke it, so I just reinstalled Ananconda)
gdal 1.11.4(not 2.0),
- May need to install some from binaries if using Windows
Sample code to run zonal statistics and save the result to CSV.
import csv from rasterstats import zonal_stats # GLOBALS RAST = 'E:/prism/data/zipped/PRISM_tmin_stable_4kmD1_20150531_bil.bil' POLY = 'E:/prism/data/census/gz_2010_us_050_00_20m/gz_2010_us_050_00_20m.shp' # Run zonal_stats stats = zonal_stats(POLY, RAST, stats=['count', 'mean', 'min', 'max'], all_touched=True, geojson_out=True) # Save to CSV keys = stats['properties'].keys() with open('E:/prism/out.csv', 'wb') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() for row in stats: dict_writer.writerow(dict((k, v.encode('utf-8')) if isinstance(v, basestring) else (k,v) for k, v in row['properties'].iteritems()))