I'm looking to try to speed up a function with numpy vectorization. I've been pretty successful with simple equations, but on my more complex conversions, I'm coming up short.
Below is an example of calculating the wetbulb temperature of air with known drybulb temperatures and relative humidity. (calculations adapted from this repo) I've tried to simply use np.vectorize, but that only sped things up over a simple apply function by about 2x. My other numpy optimizations have had speedups over 300x. That may not be possible here without cython, I'm not sure, as I'm still learning the basics of numpy and vectorization.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Temp_C':[20,0,6,-22,13,37,20,0,-10,8,14,24,19,12,4],
'relativeHumidty':[0.6,0.2,0.55,0.25,0.1,0.9,1,.67,0.24,0.81,0.46,0.51,0.50,0.65,0.72]})
def sat_press_si(tdb):
C1 = -5674.5359
C2 = 6.3925247
C3 = -0.009677843
C4 = 0.00000062215701
C5 = 2.0747825E-09
C6 = -9.484024E-13
C7 = 4.1635019
C8 = -5800.2206
C9 = 1.3914993
C10 = -0.048640239
C11 = 0.000041764768
C12 = -0.000000014452093
C13 = 6.5459673
TK = tdb + 273.15
if TK <= 273.15:
result = math.exp(C1/TK + C2 + C3*TK + C4*TK**2 + C5*TK**3 +
C6*TK**4 + C7*math.log(TK)) / 1000
else:
result = math.exp(C8/TK + C9 + C10*TK + C11*TK**2 + C12*TK**3 +
C13*math.log(TK)) / 1000
return result
def hum_rat_si(tdb, twb, P=14.257):
Pws = sat_press_si(twb)
Ws = 0.62198 * Pws / (P - Pws) # Equation 23, p6.8
if tdb >= 0:
result = (((2501 - 2.326 * twb) * Ws - 1.006 * (tdb - twb)) /
(2501 + 1.86 * tdb - 4.186 * twb))
else: # Equation 37, p6.9
result = (((2830 - 0.24*twb)*Ws - 1.006*(tdb - twb)) /
(2830 + 1.86*tdb - 2.1*twb))
return result
def hum_rat2_si(tdb, rh, P=14.257):
Pws = sat_press_si(tdb)
result = 0.62198*rh*Pws/(P - rh*Pws) # Equation 22, 24, p6.8
return result
def wet_bulb_si(tdb, rh, P=14.257):
W_normal = hum_rat2_si(tdb, rh, P)
result = tdb
W_new = hum_rat_si(tdb, result, P)
x = 0
while abs((W_new - W_normal) / W_normal) > 0.00001:
W_new2 = hum_rat_si(tdb, result - 0.001, P)
dw_dtwb = (W_new - W_new2) / 0.001
result = result - (W_new - W_normal) / dw_dtwb
W_new = hum_rat_si(tdb, result, P)
x += 1
if x > 500:
break
return result
wet_bulb_vectorized = np.vectorize(wet_bulb_si)
%timeit -n 300 wet_bulb_vectorized(df['Temp_C'].values, df['relativeHumidty'].values)
%timeit -n 300 df.apply(lambda row: wet_bulb_si(row['Temp_C'], row['relativeHumidty']), axis=1)
For the last two %timeit runs, I'm getting:
2.7 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 300 loops each) 4.17 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 300 loops each)
Any suggestions here would be appreciated!
numpy
performance issues are handled here on SO. CR answers tend to focus on style and organization, notnumpy
'vectorization'. – hpauljnp.vectorize
is a convenience function, that lets you easily apply scalar functions to arrays, but it does not offer any speed advantages over plain Python loops. With in your functions, two things jump out as restricting the code to scalar values - theif
blocks, and themath
functions.numpy
haslog
andexp
functions that operate on whole arrays.numpy
uses masking and indexing to select blocks of arrays for different operations, instead ofif/else
blocks. – hpauljsat_press_si
, and rewrite it to work directly on an array of values. If it's easier write 2 versions, one that works for values below 273 and another above. – hpaulj