78
votes

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d.

I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in Python?

p.s. I have lots of sparse data in 3d and need a tensor to store / perform multiplication. Any suggestions to implement such a tensor if there's no existing data structure?

6
...but not sparse, unfortunately.Steve Tjoa
What do you mean by a "matrix in 2D"? If you mean a matrix representing a 2D linear transformation, then you're talking about a 2x2 matrix of Real values (approximated by floating point values) with determinant 1 for a rigid rotation. If you want to represent translation as well then you embed the 2x2 matrix inside a 3x3 matrix, and if you want to allow shearing or expansion/contraction then you can relax the determinant requirement -but even so that's a total of 9 floating point values. Why do you want/need a sparse representation?Peter
@Peter "a matrix in 2D" means a matrix in 2 dimension. A unit in a 2d matrix can be represented as (x,y, r), where x & y are the coordinate and r is the value stored at (x, y). I need a sparse representation because when x & y are very very large, say x<10^5, y < 10^4, AND only very few data are stored in the matrix, say 10^4. numpy provides sparse matrix for the 2d matrix. But very often, we need 3d or even n-d. I guess n-d case is too general. So any solutions to 3d are good enough for me.zhongqi
Thanks - I was confused by the P.S. in your question (it sounded to me like you wanted to multiply a bunch of Euclidean tuples by a matrix, linear algebra style). But if you're talking about m x n x o matrices then it sounds like your "sparse" implementation is going to need to provide some sort of iterator interface in order for you to implement (element by element) multiplication.Peter

6 Answers

15
votes

Happy to suggest a (possibly obvious) implementation of this, which could be made in pure Python or C/Cython if you've got time and space for new dependencies, and need it to be faster.

A sparse matrix in N dimensions can assume most elements are empty, so we use a dictionary keyed on tuples:

class NDSparseMatrix:
  def __init__(self):
    self.elements = {}

  def addValue(self, tuple, value):
    self.elements[tuple] = value

  def readValue(self, tuple):
    try:
      value = self.elements[tuple]
    except KeyError:
      # could also be 0.0 if using floats...
      value = 0
    return value

and you would use it like so:

sparse = NDSparseMatrix()
sparse.addValue((1,2,3), 15.7)
should_be_zero = sparse.readValue((1,5,13))

You could make this implementation more robust by verifying that the input is in fact a tuple, and that it contains only integers, but that will just slow things down so I wouldn't worry unless you're releasing your code to the world later.

EDIT - a Cython implementation of the matrix multiplication problem, assuming other tensor is an N Dimensional NumPy array (numpy.ndarray) might look like this:

#cython: boundscheck=False
#cython: wraparound=False

cimport numpy as np

def sparse_mult(object sparse, np.ndarray[double, ndim=3] u):
  cdef unsigned int i, j, k

  out = np.ndarray(shape=(u.shape[0],u.shape[1],u.shape[2]), dtype=double)

  for i in xrange(1,u.shape[0]-1):
    for j in xrange(1, u.shape[1]-1):
      for k in xrange(1, u.shape[2]-1):
        # note, here you must define your own rank-3 multiplication rule, which
        # is, in general, nontrivial, especially if LxMxN tensor...

        # loop over a dummy variable (or two) and perform some summation:
        out[i,j,k] = u[i,j,k] * sparse((i,j,k))

  return out

Although you will always need to hand roll this for the problem at hand, because (as mentioned in code comment) you'll need to define which indices you're summing over, and be careful about the array lengths or things won't work!

EDIT 2 - if the other matrix is also sparse, then you don't need to do the three way looping:

def sparse_mult(sparse, other_sparse):

  out = NDSparseMatrix()

  for key, value in sparse.elements.items():
    i, j, k = key
    # note, here you must define your own rank-3 multiplication rule, which
    # is, in general, nontrivial, especially if LxMxN tensor...

    # loop over a dummy variable (or two) and perform some summation 
    # (example indices shown):
    out.addValue(key) = out.readValue(key) + 
      other_sparse.readValue((i,j,k+1)) * sparse((i-3,j,k))

  return out

My suggestion for a C implementation would be to use a simple struct to hold the indices and the values:

typedef struct {
  int index[3];
  float value;
} entry_t;

you'll then need some functions to allocate and maintain a dynamic array of such structs, and search them as fast as you need; but you should test the Python implementation in place for performance before worrying about that stuff.

10
votes

An alternative answer as of 2017 is the sparse package. According to the package itself it implements sparse multidimensional arrays on top of NumPy and scipy.sparse by generalizing the scipy.sparse.coo_matrix layout.

Here's an example taken from the docs:

import numpy as np
n = 1000
ndims = 4
nnz = 1000000
coords = np.random.randint(0, n - 1, size=(ndims, nnz))
data = np.random.random(nnz)

import sparse
x = sparse.COO(coords, data, shape=((n,) * ndims))
x
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1000000>

x.nbytes
# 16000000

y = sparse.tensordot(x, x, axes=((3, 0), (1, 2)))

y
# <COO: shape=(1000, 1000, 1000, 1000), dtype=float64, nnz=1001588>
7
votes

Have a look at sparray - sparse n-dimensional arrays in Python (by Jan Erik Solem). Also available on github.

3
votes

Nicer than writing everything new from scratch may be to use scipy's sparse module as far as possible. This may lead to (much) better performance. I had a somewhat similar problem, but I only had to access the data efficiently, not perform any operations on them. Furthermore, my data were only sparse in two out of three dimensions.

I have written a class that solves my problem and could (as far as I think) easily be extended to satisfiy the OP's needs. It may still hold some potential for improvement, though.

import scipy.sparse as sp
import numpy as np

class Sparse3D():
    """
    Class to store and access 3 dimensional sparse matrices efficiently
    """
    def __init__(self, *sparseMatrices):
        """
        Constructor
        Takes a stack of sparse 2D matrices with the same dimensions
        """
        self.data = sp.vstack(sparseMatrices, "dok")
        self.shape = (len(sparseMatrices), *sparseMatrices[0].shape)
        self._dim1_jump = np.arange(0, self.shape[1]*self.shape[0], self.shape[1])
        self._dim1 = np.arange(self.shape[0])
        self._dim2 = np.arange(self.shape[1])

    def __getitem__(self, pos):
        if not type(pos) == tuple:
            if not hasattr(pos, "__iter__") and not type(pos) == slice: 
                return self.data[self._dim1_jump[pos] + self._dim2]
            else:
                return Sparse3D(*(self[self._dim1[i]] for i in self._dim1[pos]))
        elif len(pos) > 3:
            raise IndexError("too many indices for array")
        else:
            if (not hasattr(pos[0], "__iter__") and not type(pos[0]) == slice or
                not hasattr(pos[1], "__iter__") and not type(pos[1]) == slice):
                if len(pos) == 2:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]]]
                else:
                    result = self.data[self._dim1_jump[pos[0]] + self._dim2[pos[1]], pos[2]].T
                    if hasattr(pos[2], "__iter__") or type(pos[2]) == slice:
                        result = result.T
                return result
            else:
                if len(pos) == 2:
                    return Sparse3D(*(self[i, self._dim2[pos[1]]] for i in self._dim1[pos[0]]))
                else:
                    if not hasattr(pos[2], "__iter__") and not type(pos[2]) == slice:
                        return sp.vstack([self[self._dim1[pos[0]], i, pos[2]]
                                          for i in self._dim2[pos[1]]]).T
                    else:
                        return Sparse3D(*(self[i, self._dim2[pos[1]], pos[2]] 
                                          for i in self._dim1[pos[0]]))

    def toarray(self):
        return np.array([self[i].toarray() for i in range(self.shape[0])])
0
votes

I also need 3D sparse matrix for solving the 2D heat equations (2 spatial dimensions are dense, but the time dimension is diagonal plus and minus one offdiagonal.) I found this link to guide me. The trick is to create an array Number that maps the 2D sparse matrix to a 1D linear vector. Then build the 2D matrix by building a list of data and indices. Later the Number matrix is used to arrange the answer back to a 2D array.

[edit] It occurred to me after my initial post, this could be handled better by using the .reshape(-1) method. After research, the reshape method is better than flatten because it returns a new view into the original array, but flatten copies the array. The code uses the original Number array. I will try to update later.[end edit]

I test it by creating a 1D random vector and solving for a second vector. Then multiply it by the sparse 2D matrix and I get the same result.

Note: I repeat this many times in a loop with exactly the same matrix M, so you might think it would be more efficient to solve for inverse(M). But the inverse of M is not sparse, so I think (but have not tested) using spsolve is a better solution. "Best" probably depends on how large the matrix is you are using.

#!/usr/bin/env python3
# testSparse.py
# profhuster

import numpy as np
import scipy.sparse as sM
import scipy.sparse.linalg as spLA
from array import array
from numpy.random import rand, seed
seed(101520)

nX = 4
nY = 3
r = 0.1

def loadSpNodes(nX, nY, r):
    # Matrix to map 2D array of nodes to 1D array
    Number = np.zeros((nY, nX), dtype=int)

    # Map each element of the 2D array to a 1D array
    iM = 0
    for i in range(nX):
        for j in range(nY):
            Number[j, i] = iM
            iM += 1
    print(f"Number = \n{Number}")

    # Now create a sparse matrix of the "stencil"
    diagVal = 1 + 4 * r
    offVal = -r
    d_list = array('f')
    i_list = array('i')
    j_list = array('i')
    # Loop over the 2D nodes matrix
    for i in range(nX):
        for j in range(nY):
            # Recall the 1D number
            iSparse = Number[j, i]
            # populate the diagonal
            d_list.append(diagVal)
            i_list.append(iSparse)
            j_list.append(iSparse)
            # Now, for each rectangular neighbor, add the 
            # off-diagonal entries
            # Use a try-except, so boundry nodes work
            for (jj,ii) in ((j+1,i),(j-1,i),(j,i+1),(j,i-1)):
                try:
                    iNeigh = Number[jj, ii]
                    if jj >= 0 and ii >=0:
                        d_list.append(offVal)
                        i_list.append(iSparse)
                        j_list.append(iNeigh)
                except IndexError:
                    pass
    spNodes = sM.coo_matrix((d_list, (i_list, j_list)), shape=(nX*nY,nX*nY))
    return spNodes


MySpNodes = loadSpNodes(nX, nY, r)
print(f"Sparse Nodes = \n{MySpNodes.toarray()}")
b = rand(nX*nY)
print(f"b=\n{b}")
x = spLA.spsolve(MySpNodes.tocsr(), b)
print(f"x=\n{x}")
print(f"Multiply back together=\n{x * MySpNodes}")
0
votes

I needed a 3d look up table for x,y,z and came up with this solution..
Why not use one of the dimensions to be a divisor of the third dimension? ie. use x and 'yz' as the matrix dimensions







from scipy import sparse
m = sparse.lil_matrix((100,2000), dtype=float)

def add_element((x,y,z), element):
    element=float(element)
    m[x,y+z*100]=element

def get_element(x,y,z):
    return m[x,y+z*100]

add_element([3,2,4],2.2)
add_element([20,15,7], 1.2)
print get_element(0,0,0)
print get_element(3,2,4)
print get_element(20,15,7)
print "  This is m sparse:";print m

====================
OUTPUT:
0.0
2.2
1.2
  This is m sparse:
  (3, 402L) 2.2
  (20, 715L)    1.2
====================