2
votes

Background: I'm converting specific MATLAB functions to C for a particular application to reduce overall memory overhead and increase performance for a specific algorithm via a MEX. Some of the various functions included are, for example, decomposition/linear system solver algorithms and their implementation in a hierarchy similar to that of MATLAB's left divide operator.

I realize that MATLAB uses a JIT compiler and many people report that it's difficult to beat MATLAB's performance etc, etc, however the goal here is to actually test this theory out and truly find out if any increase in performance or efficiency can be achieved.

I'm currently using GNU's GSL library.

Onto the problem...

I'm not getting the correct results for a basic diagonal matrix inversion problem.

In MATLAB, the code:

x = A\B

where A is a sparse, diagonal matrix where every element is on the diagonal is non-zero, and B is just a regular filled matrix that meets the sizing requirements for matrix multiplication with A.

To solve this, shouldn't it just be a basic for loop that replaces every element in the main diagonal with the inverse of that element, or 1/element?

And then multiplied by matrix B of course. It seems like a simple problem, but it's not producing the same results as MATLAB, even though the input matrices are the same.

C Code:

    gsl_matrix* matrix1;  
    gsl_matrix* matrix1_copy;
    int index; 

    ...  //  A gets initialized and filled as a sparse diagonal matrix

    gsl_matrix_memcpy(matrix1_copy, matrix1);
    gsl_vector diag = gsl_matrix_diagonal(matrix1_copy).vector;
    for (index = 0; index < diag.size; index++) {
        gsl_matrix_set(matrix1_copy, index, index, 1/gsl_vector_get(&diag, index));
    }

This is confirmed to replace all of the diagonal elements with their inverse. This matrix is returned and then used as the first matrix in this function:

gsl_matrix* multiply(gsl_matrix* matrix1, gsl_matrix* matrix2) {
gsl_matrix* final_matrix = gsl_matrix_calloc(matrix1->size1, matrix2->size2);
gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1.0, matrix1, matrix2, 0.0, final_matrix);
return final_matrix;
}

The matrix returned is written to file so I can easily read it. This whole process produces no errors nor warnings, and each step seems to complete, but the final matrix does not match it's counterpart in MATLAB; it's difficult to tell initially, since this is a very large sparse matrix, but the beginnings of both matrices being compared is correct, however upon investing the lower right hand corner of the matrix in MATLAB, I observe numbers that are simply 0's in my C code. My precision is set to output correctly.

Any idea's?

1
Couple of things to check. 1) Use the same algorithm in Matlab, i.e. do x=AA*B, where AA=1./A. I am pretty certain that x=A/B does not do inv(A)*B but solves the system A*x=B without calculating the inverse matrix. Since the operations performed are not identical due to rounding errors you might get somewhat different values. 2) To rule out incorrect usage of the GSL library code the multiplication yourself using loops (ie x(i,k)=sum(1/A(i,j)*B(j,k), where sum goes over j). 3) Perhaps the values become zeroes when written in the file. Check how you write in/read from the file.PetrH

1 Answers

0
votes

You should use Back-substitution and treat the diagonal matrix A as an special case. This way you will cover a wider range of problems with the same algorithm (and same unit tests!)

First extra comment: since you are developing an algorithm from scratch, you must use simple test cases. Testing with complicated cases will be proven difficult. What I mean with this is that your test cases must be so small that you could verify the result by hand.

Second extra comment: Matlab uses the BLAS/LAPACK libraries optimized by Intel aka Intel Math Kernel Library. If your goal is to beat Matlab, you should use a BLAS or BLAS-like library: goto BLAS, OpenBLAS, or FLAME.

Last extra comment: you are not the first person I know that wants to beat Matlab. Learn from the error of the others. Choose a target platform and optimize as hell. Know your processor to the core. Use multi-threading. Use MPI. Take this as your bible: Agner's optimization techniques