I've started using memoryviews in cython to access numpy arrays. One of the various advantages they have is that they are considerably faster than the old numpy buffer support: http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support
However, I have an example where the old numpy buffer support is faster than memoryviews! How can this be?! I wonder if I'm using memoryviews correctly?
This is my test:
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[np.uint8_t, ndim=2] image_box1(np.ndarray[np.uint8_t, ndim=2] im,
np.ndarray[np.float64_t, ndim=1] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.ndarray[np.uint8_t, ndim=2] box = im[top:bottom, left:right]
return box
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.uint8_t[:, ::1] image_box2(np.uint8_t[:, ::1] im,
np.float64_t[:] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.uint8_t[:, ::1] box = im[top:bottom, left:right]
return box
The timing results are:
image_box1: typed numpy: 100000 loops, best of 3: 11.2 us per loop
image_box2: memoryview: 100000 loops, best of 3: 18.1 us per loop
These measurements are done from IPython using %timeit image_box1(im, pd, box_half_size)
np.ndarrayalso in the second function (I assume), which may already explain the slowdown, since making thenp.ndarrayis a bit of extra work and there is not much done here overall. - seberg