2
votes

I'm certain there's a good way to do this but I'm blanking on the right search terms to google, so I'll ask here instead. My problem is this:

I have 2 2-dimensional array, both with the same dimensions. One array (array 1) is the accumulated precipitation at (x,y) points. The other (array 2) is the topographic height of the same (x,y) grid. I want to sum up array 1 between specific heights of array 2, and create a bar graph with topographic height bins a the x-axis and total accumulated precipitation on the y axis.

So I want to be able to declare a list of heights (say [0, 100, 200, ..., 1000]) and for each bin, sum up all precipitation that occurred within that bin.

I can think of a few complicated ways to do this, but I'm guessing there's probably an easier way that I'm not thinking of. My gut instinct is to loop through my list of heights, mask anything outside of that range, sum up remaining values, add those to a new array, and repeat.

I'm wondering is if there's a built-in numpy or similar library that can do this more efficiently.

3

3 Answers

2
votes

This code shows what you're asking for, some explanation in comments:

import numpy as np


def in_range(x, lower_bound, upper_bound):
    # returns wether x is between lower_bound (inclusive) and upper_bound (exclusive)
    return x in range(lower_bound, upper_bound)


# vectorize allows you to easily 'map' the function to a numpy array
vin_range = np.vectorize(in_range)

# representing your rainfall
rainfall = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# representing your height map
height = np.array([[1, 2, 1], [2, 4, 2], [3, 6, 3]])
# the bands of height you're looking to sum
bands = [[0, 2], [2, 4], [4, 6], [6, 8]]

# computing the actual results you'd want to chart
result = [(band, sum(rainfall[vin_range(height, *band)])) for band in bands]

print(result)

The next to last line is where the magic happens. vin_range(height, *band) uses the vectorized function to create a numpy array of boolean values, with the same dimensions as height, that has True if a value of height is in the range given, or False otherwise.

By using that array to index the array with the target values (rainfall), you get an array that only has the values for which the height is in the target range. Then it's just a matter of summing those.

In more steps than result = [(band, sum(rainfall[vin_range(height, *band)])) for band in bands] (but with the same result):

result = []
for lower, upper in bands:
    include = vin_range(height, lower, upper)
    values_to_include = rainfall[include]
    sum_of_rainfall = sum(values_to_include)
    result.append(([lower, upper], sum_of_rainfall))
2
votes

You can use np.bincount together with np.digitize. digitize creates an array of bin indices from the height array height and the bin boundaries bins. bincount then uses the bin indices to sum the data in array rain.

# set up
rain  = np.random.randint(0,100,(5,5))/10
height = np.random.randint(0,10000,(5,5))/10
bins = [0,250,500,750,10000]

# compute
sums = np.bincount(np.digitize(height.ravel(),bins),rain.ravel(),len(bins)+1)

# result
sums
# array([ 0. , 37. , 35.6, 14.6, 22.4,  0. ])

# check against direct method
[rain[(height>=bins[i]) & (height<bins[i+1])].sum() for i in range(len(bins)-1)]
# [37.0, 35.6, 14.600000000000001, 22.4]
1
votes

An example using the numpy ma module which allows to make masked arrays. From the docs:

A masked array is the combination of a standard numpy.ndarray and a mask. A mask is either nomask, indicating that no value of the associated array is invalid, or an array of booleans that determines for each element of the associated array whether the value is valid or not.

which seems what you need in this case.

import numpy as np

pr = np.random.randint(0, 1000, size=(100, 100)) #precipitation map
he = np.random.randint(0, 1000, size=(100, 100)) #height map

bins = np.arange(0, 1001, 200)

values = []
for vmin, vmax in zip(bins[:-1], bins[1:]):
    #creating the masked array, here minimum included inside bin, maximum excluded.
    maskedpr = np.ma.masked_where((he < vmin) | (he >= vmax), pr)
    values.append(maskedpr.sum())

values is the list of values for each bin, which you can plot.

The numpy.ma.masked_where function returns an array masked where condition is True. So you need to set the condition to be True outside the bins.
The sum() method performs the sum only where the array is not masked.