1
votes

I have two matrices A and B, each with a size of NxM, where N is the number of samples and M is the size of histogram bins. Thus, each row represents a histogram for that particular sample.

What I would like to do is to compute the chi-square distance between two matrices for a different pair of samples. Therefore, each row in the matrix A will be compared to all rows in the other matrix B, resulting a final matrix C with a size of NxN and C[i,j] corresponds to the chi-square distance between A[i] and B[j] histograms.

Here is my python code that does the job:

def chi_square(histA,histB):
   esp = 1.e-10
   d = sum((histA-histB)**2/(histA+histB+eps))
   return 0.5*d
def matrix_cost(A,B):
   a,_ = A.shape
   b,_ = B.shape
   C = zeros((a,b))
   for i in xrange(a):
      for j in xrange(b):
         C[i,j] = chi_square(A[i],B[j])
  return C

Currently, for a 100x70 matrix, this entire process takes 0.1 seconds.

Is there any way to improve this performance?

I would appreciate any thoughts or recommendations.

Thank you.

1

1 Answers

1
votes

Sure! I'm assuming you're using numpy?

If you have the RAM available, you could use broadcast the arrays and use numpy's efficient vectorization of the operations on those arrays.

Here's how:

Abroad = A[:,np.newaxis,:]  # prepared for broadcasting
C = np.sum((Abroad - B)**2/(Abroad + B), axis=-1)/2.

Timing considerations on my platform show a factor of 10 speed gain compared to your algorithm.

A slower option (but still faster than your original algorithm) that uses less RAM than the previous option is simply to broadcast the rows of A into 2D arrays:

def new_way(A,B):
    C = np.empty((A.shape[0],B.shape[0]))
    for rowind, row in enumerate(A):
        C[rowind,:] = np.sum((row - B)**2/(row + B), axis=-1)/2.
    return C

This has the advantage that it can be run for arrays with shape (N,M) much larger than (100,70).

You could also look to Theano to push the expensive for-loops to the C-level if you don't have the memory available. I get a factor 2 speed gain compared to the first option (not taking into account the initial compile time) for both the (100,70) arrays as well as (1000,70):

import theano
import theano.tensor as T
X = T.matrix("X")
Y = T.matrix("Y")
results, updates = theano.scan(lambda x_i: ((x_i - Y)**2/(x_i+Y)).sum(axis=1)/2., sequences=X)
chi_square_norm = theano.function(inputs=[X, Y], outputs=[results])
chi_square_norm(A,B)  # same result