numpy - sparse 3d matrix/array in Python?

Question

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.

I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in Python?

p.s. I have lots of sparse data in 3d and need a tensor to store / perform multiplication. Any suggestions to implement such a tensor if there's no existing data structure?

this post might help stackoverflow.com/questions/4490961/… — jayunit100
What do you mean by a "matrix in 2D"? If you mean a matrix representing a 2D linear transformation, then you're talking about a 2x2 matrix of Real values (approximated by floating point values) with determinant 1 for a rigid rotation. If you want to represent translation as well then you embed the 2x2 matrix inside a 3x3 matrix, and if you want to allow shearing or expansion/contraction then you can relax the determinant requirement -but even so that's a total of 9 floating point values. Why do you want/need a sparse representation? — Peter
@Peter "a matrix in 2D" means a matrix in 2 dimension. A unit in a 2d matrix can be represented as (x,y, r), where x & y are the coordinate and r is the value stored at (x, y). I need a sparse representation because when x & y are very very large, say x<10^5, y < 10^4, AND only very few data are stored in the matrix, say 10^4. numpy provides sparse matrix for the 2d matrix. But very often, we need 3d or even n-d. I guess n-d case is too general. So any solutions to 3d are good enough for me. — zhongqi
Thanks - I was confused by the P.S. in your question (it sounded to me like you wanted to multiply a bunch of Euclidean tuples by a matrix, linear algebra style). But if you're talking about m x n x o matrices then it sounds like your "sparse" implementation is going to need to provide some sort of iterator interface in order for you to implement (element by element) multiplication. — Peter

tehwalrus tehwalrus · Accepted Answer · 2011-10-11T15:47:18

Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.

A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples:

class NDSparseMatrix:
  def __init__(self):
    self.elements = {}

  def addValue(self, tuple, value):
    self.elements[tuple] = value

  def readValue(self, tuple):
    try:
      value = self.elements[tuple]
    except KeyError:
      # could also be 0.0 if using floats...
      value = 0
    return value

and you would use it like so:

sparse = NDSparseMatrix()
sparse.addValue((1,2,3), 15.7)
should_be_zero = sparse.readValue((1,5,13))

You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.

EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array (numpy.ndarray) might look like this:

#cython: boundscheck=False
#cython: wraparound=False

cimport numpy as np

def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
  cdef unsigned int i, j, k

  out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)

  for i in xrange(1,u.shape[0]-1):
    for j in xrange(1, u.shape[1]-1):
      for k in xrange(1, u.shape[2]-1):
        # note, here you must define your own rank-3 multiplication rule, which
        # is, in general, nontrivial, especially if LxMxN tensor...

        # loop over a dummy variable (or two) and perform some summation:
        out[i,j,k] = u[i,j,k] * sparse((i,j,k))

  return out

Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!

EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:

def sparse_mult(sparse, other_sparse):

  out = NDSparseMatrix()

  for key, value in sparse.elements.items():
    i, j, k = key
    # note, here you must define your own rank-3 multiplication rule, which
    # is, in general, nontrivial, especially if LxMxN tensor...

    # loop over a dummy variable (or two) and perform some summation 
    # (example indices shown):
    out.addValue(key) = out.readValue(key) + 
      other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))

  return out

My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:

typedef struct {
  int index[3];
  float value;
} entry_t;

you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need; but you should test the Python implementation in place for performance before worrying about that stuff.

numpy - sparse 3d matrix/array in Python?

6 Answers