Matlab and Python Reading Binary File Differently

Question

I am reading the same binary file into Python and Matlab and putting it into a matrix. When I take the norm of this matrix, I get different results.

I am using the respective smatload function to load the binary file.

Python:

def smatload(filename):
    #print 'opening: ', filename
    f = open(filename, 'rb')
    m = np.fromfile(f,'q',1)
    n = np.fromfile(f,'q',1)
    nnz = np.fromfile(f,'q',1)
    print 'reading %d x %d with %d non-zeros' % (m,n,nnz)
    S = np.fromfile(f,'d',3*nnz)
    f.close()
    S = S.reshape((nnz,3))
    rows = S[:,0].astype(int) - 1
    cols = S[:,1].astype(int) - 1
    vals = S[:,2]
    return csr_matrix((vals,(rows,cols)),shape=(m,n))

Matlab:

function [A] = smatload(filename)

fid = fopen(filename,'r');
if( fid == -1 )
    disp(sprintf('Error: Unable to open file [%s], fid=%d\n',filename,fid));
    A = [-1];
    fclose(fid);
    return;
end

m   = fread(fid,[1 1],'uint64');
n   = fread(fid,[1 1],'uint64');
nnz = fread(fid,[1 1],'uint64');

fprintf('Reading %d x %d with %d non-zeros\n',m,n,nnz);

S = fread(fid,[3 nnz],'double');
fclose(fid);
A = sparse(S(1,:)',S(2,:)',S(3,:)',m,n);

The results that I get for the norms of the returned matrices are

Matlab: norm(A.'fro') = 0.018317077159881

Python: np.linalg.norm(A) = 0.018317077159760

I have confirmed that they are both reading the correct number of values (6590x7126 matrix, 122526 non-zeros), and that I am using the same norm for both (frobenius).

Any ideas as to what might cause this?

This could just be a case of MATLAB doing various things to increase accuracy of floating point mathematics. I know that in MATLAB, if you add 0.01 1,000,000 times, you'll actually get 10,000, rather than the slightly off version you get in most other languages. — childofsoong
I have used this function other times for projects that I am doing and I haven't seen a difference. I'm curious as to why this is popping up now. It is causing an issue that my Matlab solution did not run into. — David
So you are concerned about differences off in the 12th decimal place when dealing with 100,000 floats? Not that it's relevant, but I'd be curious if the MATLAB array can be saved to a .mat file, and read with scipy.io.loadmat, and whether the results are the same. — hpaulj
Given the experiments that I am doing, it is causing some issues with convergence criterion. Let me check what that gives, and I'll update my post. — David
@David did your previous uses of the function involve such large matrices? I could easily see a 6590x7126 matrix getting into numbers too large for typical floating point representations to have high accuracy. — childofsoong

jeremy jeremy · Accepted Answer · 2015-02-24T23:49:26

A quick look at the Frobenius Norm shows it requires squaring all of the values and adding them together.

Since you have uint64 in the read command, it looks like you could be filling up the floating point storage. When you multiply two binary numbers, it takes twice as many bits to store the answer. This means you would need 128 bits to store all the decimal values. If Python and MATLAB are doing this differently, that could explain why your decimal values are different.

See these two links for information about how MATLAB and Python handle floating point accuracy:

Python: https://docs.python.org/2/tutorial/floatingpoint.html

MATLAB: http://blogs.mathworks.com/cleve/2014/07/07/floating-point-numbers/

Matlab and Python Reading Binary File Differently

3 Answers