2
votes

I'm using griddata() to interpolate my (irregular) 2-dimensional depth-measurements; x,y,depth. The method does a great job - but it interpolates over the entire grid where it can find to opposing points. I don't want that behaviour. I'd like to have an interpolation around the existing measurements, say with up to an extent of a certain radius.

Is it possible to tell numpy/scipy: don't interpolate if you're too far from an existing measurement? Resulting in a NODATA-value? ideal = griddata(.., .., .., radius=5.0)

edit example: In the image below; black dots are the measurements. Shades of blue are the interpolated cells by numpy. The area marked in green is in fact part of the picture but is considered as NODATA by numpy (because there's no points in between). Now, the red areas, are interpolated, but I want to get rid of them. any ideas? enter image description here

1
If I understand correctly, you don't actually have a grid in {x,y,depth}, but some sparsely or randomly located points in that space? - v.chaplin
@v.chaplin yes, that is correct. It is a bathymetric dataset. It has 'ramdomly' distributed measurements, including canals, islands etc. I used griddata for interpolation to an image. But e.g. islands are interpolated as being water. Thats nt true, and I want to get rid of that. - Willem van Opstal
You want the interpolation to only be based on points within the threshold distance, or is that just for assigning nan? - Daniel F
hm, sort of.. First: basically every cell without a point within a certain radius from that cell, can be assigned NODATA. But also, in case of the shores around the island, I only want to interpolate using the points on that side of the island. - Willem van Opstal
How were the red lines generated? - v.chaplin

1 Answers

2
votes

Ok cool. I don't think there is a built-in option for griddata() that does what you want, so you will need to write it yourself.

This comes down to calculating the distances between N input data points and M interpolation points. This is simple enough to do but if you have a lot of points it can be slow at ~O(M*N). But here's an example that calculates the distances to allN data points, for each interpolation point. If the number of data points withing the radius is at least neighbors, it keeps the value. Otherwise is writes the value of NODATA.

neighbors is 4 because griddata() will use biilinear interpolation which needs points bounding the interpolants in each dimension (2*2 = 4).

#invec - input points Nx2 numpy array
#mvec - interpolation points Mx2 numpy array

#just some random points for example
N=100
invec = 10*np.random.random([N,2])

M=50
mvec = 10*np.random.random([M,2])

# --- here you would put your griddata() call, returning interpolated_values
interpolated_values = np.zeros(M)
NODATA=np.nan

radius = 5.0
neighbors = 4

for m in range(M):
    data_in_radius = np.sqrt(np.sum( (invec - mvec[m])**2, axis=1)) <= radius

    if np.sum(data_in_radius) < neighbors :
        interpolated_values[m] = NODATA

Edit: Ok re-read and noticed the input is really 2D. Example modified.

Just as an additional comment, this could be greatly accelerated if you first build a coarse mapping from each point mvec[m] to a subset of the relevant data points. The costliest step in the loop would change from

np.sqrt(np.sum( (invec - mvec[m])**2, axis=1))

to something like

np.sqrt(np.sum( (invec[subset[m]] - mvec[m])**2, axis=1))

There are plenty of ways to do this, for example using a Quadtree, hashing function, or 2D index. But whether this gives performance advantage depends on the application, how your data is structured, etc.