1
votes

I am reading the same binary file into Python and Matlab and putting it into a matrix. When I take the norm of this matrix, I get different results.

I am using the respective smatload function to load the binary file.

Python:

def smatload(filename):
    #print 'opening: ', filename
    f = open(filename, 'rb')
    m = np.fromfile(f,'q',1)
    n = np.fromfile(f,'q',1)
    nnz = np.fromfile(f,'q',1)
    print 'reading %d x %d with %d non-zeros' % (m,n,nnz)
    S = np.fromfile(f,'d',3*nnz)
    f.close()
    S = S.reshape((nnz,3))
    rows = S[:,0].astype(int) - 1
    cols = S[:,1].astype(int) - 1
    vals = S[:,2]
    return csr_matrix((vals,(rows,cols)),shape=(m,n))

Matlab:

function [A] = smatload(filename)

fid = fopen(filename,'r');
if( fid == -1 )
    disp(sprintf('Error: Unable to open file [%s], fid=%d\n',filename,fid));
    A = [-1];
    fclose(fid);
    return;
end

m   = fread(fid,[1 1],'uint64');
n   = fread(fid,[1 1],'uint64');
nnz = fread(fid,[1 1],'uint64');

fprintf('Reading %d x %d with %d non-zeros\n',m,n,nnz);

S = fread(fid,[3 nnz],'double');
fclose(fid);
A = sparse(S(1,:)',S(2,:)',S(3,:)',m,n);

The results that I get for the norms of the returned matrices are

Matlab: norm(A.'fro') = 0.018317077159881

Python: np.linalg.norm(A) = 0.018317077159760

I have confirmed that they are both reading the correct number of values (6590x7126 matrix, 122526 non-zeros), and that I am using the same norm for both (frobenius).

Any ideas as to what might cause this?

3
This could just be a case of MATLAB doing various things to increase accuracy of floating point mathematics. I know that in MATLAB, if you add 0.01 1,000,000 times, you'll actually get 10,000, rather than the slightly off version you get in most other languages.childofsoong
I have used this function other times for projects that I am doing and I haven't seen a difference. I'm curious as to why this is popping up now. It is causing an issue that my Matlab solution did not run into.David
So you are concerned about differences off in the 12th decimal place when dealing with 100,000 floats? Not that it's relevant, but I'd be curious if the MATLAB array can be saved to a .mat file, and read with scipy.io.loadmat, and whether the results are the same.hpaulj
Given the experiments that I am doing, it is causing some issues with convergence criterion. Let me check what that gives, and I'll update my post.David
@David did your previous uses of the function involve such large matrices? I could easily see a 6590x7126 matrix getting into numbers too large for typical floating point representations to have high accuracy.childofsoong

3 Answers

4
votes

A quick look at the Frobenius Norm shows it requires squaring all of the values and adding them together.

Since you have uint64 in the read command, it looks like you could be filling up the floating point storage. When you multiply two binary numbers, it takes twice as many bits to store the answer. This means you would need 128 bits to store all the decimal values. If Python and MATLAB are doing this differently, that could explain why your decimal values are different.

See these two links for information about how MATLAB and Python handle floating point accuracy:

Python: https://docs.python.org/2/tutorial/floatingpoint.html

MATLAB: http://blogs.mathworks.com/cleve/2014/07/07/floating-point-numbers/

2
votes

Well Matlab definitely seems to have different implementation for sparse and for dense arrays. Using the 4425x7126 sparse matrix A with 54882 non zero entries that you linked to and the following commands :

FA=full(A);
av=A(:);
fav=FA(:);

I would expect the following commands to all yield the same value, because they are all computing the square root of the sum of the squares of the (non zero) elements of A :

norm(A,'fro')
norm(av,2)
norm(FA,'fro')
norm(fav,2)

sqrt( sum(av .* av) )
sqrt( sum(av .^ 2) )

sqrt( sum(fav .* fav) )
sqrt( sum(fav .^ 2) )

In fact we see three slightly different answers :

 norm(A,'fro')             0.0223294051001499
 norm(av,2)                0.0223294051001499
 norm(FA,'fro')            0.0223294051001499
 norm(fav,2)               0.0223294051001499

 sqrt( sum(av .* av) )     0.0223294051001521
 sqrt( sum(av .^ 2) )      0.0223294051001521

 sqrt( sum(fav .* fav) )   0.0223294051001506
 sqrt( sum(fav .^ 2) )     0.0223294051001506

In fact, even the reported sum of the elements of the sparse and dense representations of A are (a little bit) different :

sum(A(:))                 1.00000000000068
sum(FA(:))                1.00000000000035

These differences seem to be of the same order of magnitude that you were seeing between the Python and Matlab norms.

1
votes

Not an answer but I don't have enough rep to comment. Would it be worthwhile to try to narrow the problem down a bit? If you partitioned your original matrix into 4 sub-matrices (Top left, top right, bottom left, bottom right) and compared the Frobenius norm reported in Matlab and in Python for each sub-matrix do you still see a discrepency between any values? If yes, then rinse and repeat on that sub-matix. If no, then don't even waste your time reading this comment. :-)