I have a netcdf file of the EDGAR emission inventory and a shapefile of the US Census data. I want to extract the data from the netcdf that only overlaps/intersects with the NYC region in the entire shapefile so then I can calculate the total emissions in NYC.
- Shapefile is 2018 US Census cartographic Boundary Files for Urban Areas. https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_ua10_500k.zip
- Netcdf file is EDGAR Emission inventories for N2O in 2015. https://edgar.jrc.ec.europa.eu/gallery.php?release=v50&substance=N2O§or=TOTALS
I have never worked with shapefiles/GeoPandas so bear with me. I am able to read in the shapefile, filter for specific region, then convert netcdf to GeoDataFrame. I want to only keep the data from the netcdf that falls within the filtered region from the shapefile in order to do analysis.
Update: I tried using sjoin
and clip
but when I execute that command, my dataframes have no data and when I plot with sjoin
, I get an error "The GeoDataFrame you are attempting to plot is empty. Nothing has been displayed."
import netCDF4
import numpy as np
from osgeo import gdal,osr,ogr
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import xarray as xr
# read in file path for shapefile
fp_shp = "C:/Users/cb_2018_us_ua10_500k/cb_2018_us_ua10_500k.shp"
# read in netcdf file path
ncs = "C:/Users/v50_N2O_2015.0.1x0.1.nc"
# Read in NETCDF as a pandas dataframe
# Xarray provides a simple method of opening netCDF files, and converting them to pandas dataframes
ds = xr.open_dataset(ncs)
edgar = ds.to_dataframe()
# the index in the df is a Pandas.MultiIndex. To reset it, use df.reset_index()
edgar = edgar.reset_index()
# Read shapefile using gpd.read_file()
shp = gpd.read_file(fp_shp)
# read the netcdf data file
#nc = netCDF4.Dataset(ncs,'r')
# quick check for shpfile plotting
shp.plot(figsize=(12, 8));
# filter out shapefile for SPECIFIC city/region
# how to filter rows in DataFrame that contains string
# extract NYC from shapefile dataframe
nyc_shp = shp[shp['NAME10'].str.contains("New York")]
# export shapefile
#nyc_shp.to_file('NYC.shp', driver ='ESRI Shapefile')
# use geopandas points_from_xy() to transform Longitude and Latitude into a list of shapely.Point objects and set it as a geometry while creating the GeoDataFrame
edgar_gdf = gpd.GeoDataFrame(edgar, geometry=gpd.points_from_xy(edgar.lon, edgar.lat))
print(edgar_gdf.head())
# check CRS coordinates
nyc_shp.crs #shapefile
edgar_gdf.crs #geodataframe netcdf
# set coordinates equal to each other
# PointsGeodataframe.crs = PolygonsGeodataframe.crs
edgar_gdf.crs = nyc_shp.crs
# check coordinates after setting coordinates equal to each other
edgar_gdf.crs #geodataframe netcdf
# Clip points, lines, or polygon geometries to the mask extent.
mask = gpd.clip(edgar_gdf, nyc_shp)