Optimizing Complex Numpy Functions

Question

I'm looking to try to speed up a function with numpy vectorization. I've been pretty successful with simple equations, but on my more complex conversions, I'm coming up short.

Below is an example of calculating the wetbulb temperature of air with known drybulb temperatures and relative humidity. (calculations adapted from this repo) I've tried to simply use np.vectorize, but that only sped things up over a simple apply function by about 2x. My other numpy optimizations have had speedups over 300x. That may not be possible here without cython, I'm not sure, as I'm still learning the basics of numpy and vectorization.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Temp_C':[20,0,6,-22,13,37,20,0,-10,8,14,24,19,12,4],
             'relativeHumidty':[0.6,0.2,0.55,0.25,0.1,0.9,1,.67,0.24,0.81,0.46,0.51,0.50,0.65,0.72]})

def sat_press_si(tdb):
    C1 = -5674.5359
    C2 = 6.3925247
    C3 = -0.009677843
    C4 = 0.00000062215701
    C5 = 2.0747825E-09
    C6 = -9.484024E-13
    C7 = 4.1635019
    C8 = -5800.2206
    C9 = 1.3914993
    C10 = -0.048640239
    C11 = 0.000041764768
    C12 = -0.000000014452093
    C13 = 6.5459673

    TK = tdb + 273.15 

    if TK <= 273.15:
        result = math.exp(C1/TK + C2 + C3*TK + C4*TK**2 + C5*TK**3 +
                      C6*TK**4 + C7*math.log(TK)) / 1000
    else:
        result = math.exp(C8/TK + C9 + C10*TK + C11*TK**2 + C12*TK**3 +
                      C13*math.log(TK)) / 1000
    return result

def hum_rat_si(tdb, twb, P=14.257):

    Pws = sat_press_si(twb)
    Ws = 0.62198 * Pws / (P - Pws)  # Equation 23, p6.8
    if tdb >= 0:
        result = (((2501 - 2.326 * twb) * Ws - 1.006 * (tdb - twb)) /
              (2501 + 1.86 * tdb - 4.186 * twb))
    else:  # Equation 37, p6.9
        result = (((2830 - 0.24*twb)*Ws - 1.006*(tdb - twb)) /
              (2830 + 1.86*tdb - 2.1*twb))
    return result

def hum_rat2_si(tdb, rh, P=14.257):
    Pws = sat_press_si(tdb)
    result = 0.62198*rh*Pws/(P - rh*Pws)    # Equation 22, 24, p6.8
    return result

def wet_bulb_si(tdb, rh, P=14.257):
    W_normal = hum_rat2_si(tdb, rh, P)
    result = tdb
    W_new = hum_rat_si(tdb, result, P)
    x = 0
    while abs((W_new - W_normal) / W_normal) > 0.00001:
        W_new2 = hum_rat_si(tdb, result - 0.001, P)
        dw_dtwb = (W_new - W_new2) / 0.001
        result = result - (W_new - W_normal) / dw_dtwb
        W_new = hum_rat_si(tdb, result, P)
        x += 1
        if x > 500:
            break
    return result

wet_bulb_vectorized = np.vectorize(wet_bulb_si)

%timeit -n 300 wet_bulb_vectorized(df['Temp_C'].values, df['relativeHumidty'].values)
%timeit -n 300 df.apply(lambda row: wet_bulb_si(row['Temp_C'], row['relativeHumidty']), axis=1)

For the last two %timeit runs, I'm getting:

2.7 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 300 loops each) 4.17 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 300 loops each)

Any suggestions here would be appreciated!

@user10605163, CR may be great for C++ code, but most numpy performance issues are handled here on SO. CR answers tend to focus on style and organization, not numpy 'vectorization'. — hpaulj
np.vectorize is a convenience function, that lets you easily apply scalar functions to arrays, but it does not offer any speed advantages over plain Python loops. With in your functions, two things jump out as restricting the code to scalar values - the if blocks, and the math functions. numpy has log and exp functions that operate on whole arrays. numpy uses masking and indexing to select blocks of arrays for different operations, instead of if/else blocks. — hpaulj
For a start I'd suggest focusing on one function, such as sat_press_si, and rewrite it to work directly on an array of values. If it's easier write 2 versions, one that works for values below 273 and another above. — hpaulj

hpaulj hpaulj · Accepted Answer · 2019-01-01T05:26:06

Let's focus first on the use of math.log vs np.log:

In [132]: x = np.linspace(1,10,5)

Simple list iteration:

In [133]: [math.log(i) for i in x]
Out[133]: 
[0.0,
 1.1786549963416462,
 1.7047480922384253,
 2.0476928433652555,
 2.302585092994046]

Make it an array:

In [134]: np.array([math.log(i) for i in x])
Out[134]: array([0.        , 1.178655  , 1.70474809, 2.04769284, 2.30258509])

Now with np.vectorize:

In [135]: f = np.vectorize(np.log, otypes=[float])
In [136]: f(x)
Out[136]: array([0.        , 1.178655  , 1.70474809, 2.04769284, 2.30258509])

And np.log:

In [137]: np.log(x)
Out[137]: array([0.        , 1.178655  , 1.70474809, 2.04769284, 2.30258509])

With this small x timings will be similar, but for a much larger array, np.log clearly wins:

In [138]: xb = np.linspace(1,10,5000)
In [139]: timeit np.array([math.log(i) for i in xb])
1.28 ms ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [140]: timeit f(xb)
6.84 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [141]: timeit np.log(xb)
174 µs ± 674 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Now try a version that can handle negative and 0 values:

def foo(x):
    if x==0:
        return -np.inf
    elif x<0:
        return math.log(-x)
    else:
        return math.log(x)

This could be used in the loop or vectorized as above.

Instead of the plain np.log, we can split the input into different blocks based on value:

def foo1(x):
    mask1 = x<0
    mask2 = x>0
    res = np.full_like(x, -np.inf)
    res[mask1] = np.log(-x[mask1])
    res[mask2] = np.log(x[mask2])
    return res

A slightly fancier version making use of extra parameters of np.log. Don't worry if you don't understand it. It isn't that much faster, and may not be useful in your case(s).

def foo2(x):
    mask1 = x<0
    mask2 = x>0
    res = np.full_like(x, -np.inf)
    np.log(-x, out=res, where=mask1)
    np.log(x, out=res, where=mask2)
    return res

(equality tests and timings omitted for now).

Optimizing Complex Numpy Functions

1 Answers