Best way to store a sparse matrix in .NET

Question

We have an application that stores a sparse matrix. This matrix has entries that mostly exist around the main diagonal of the matrix. I was wondering if there were any efficient algorithms (or existing libraries) that can efficiently handle sparse matrices of this kind? Preferably, this would be a generic implementation where each matrix entry can be a user-defined type.

Edit in response to a question/response:

When I say mostly around the main diagonal I mean that the characteristics of most of the matrices will be that most entries are clustered off of the main diagonal but there could be zeroes close to the diagonal and there could be non-zero values far out from the diagonal. I want something efficient for 'most' cases here.

What will I be using this for? I need to be able to have efficient access to all values in a row or all values in a column. The values stored would be Boolean values. An example would be:

For all true values in a row, foreach column a true appears in set all the entries of the column to something
For all false values in a row, set the entry to something

This was all done with linked lists previously but was very confusing to implement. I was hoping that with a sparse matrix I could improve the algorithm but finding the 'right' type of sparse matrix algorithm has proved difficult.

p.s. Thanks for the responses thus far

I updated my answer. So is performance efficiency more important than space efficiency? You say "efficient way to handle sparse matrices" and then in your use cases talk about multiple ways of accessing the data. — Erich Mirabal
I would say performance is more important than space efficiency. We will be handling very large amounts of data anyways so I don't mind using lots of space for the matrix so long as it goes faster — Jeffrey Cameron

Erich Mirabal Erich Mirabal · Accepted Answer · 2009-04-16T15:04:19

You could use an index based on the [row,col] of the cell. Since the data is on a diagonal, the typical approach of storing the row index and the associated column indeces with data is not optimal. Here is some code you could use to do it:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long Size { get; private set; }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.Size = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;
            }
        }
    }

    static void Main()
    {
        var sm = new SparseMatrix<int>(512, 512);
        sm[42, 42] = 42;
        int val1 = sm[13, 13];
        int val2 = sm[42, 42];

        Console.WriteLine("VAL1 = " + val1); // prints out 0
        Console.WriteLine("VAL2 = " + val2); // prints out 42

        Console.ReadLine();
    }

Note that when T is a struct, you might have to call the IsCellEmpty since getting the contents of a cell will not be null and will have the default value for that type. You can also expand the code to give you a quick "SparseRatio" based on the Size property and _cells.Count.

EDIT:

Well, if you are interesting is speed, you can do the trade-off of space vs speed. Instead of having only one dictionary, have three! It triples your space, but it makes enumerating in any way you want real easy. Here is some new code that shows that:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long MaxSize { get; private set; }
        public long Count { get { return _cells.Count; } }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        private Dictionary<int, Dictionary<int, T>> _rows = 
            new Dictionary<int, Dictionary<int, T>>();

        private Dictionary<int, Dictionary<int, T>> _columns = 
            new Dictionary<int, Dictionary<int, T>>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.MaxSize = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;

                UpdateValue(col, row, _columns, value);
                UpdateValue(row, col, _rows, value);
            }
        }

        private void UpdateValue(int index1, int index2, 
            Dictionary<int, Dictionary<int, T>> parent, T value)
        {
            Dictionary<int, T> dict;
            if (!parent.TryGetValue(index1, out dict))
            {
                parent[index2] = dict = new Dictionary<int, T>();
            }
            dict[index2] = value;
        }
    }

If you want to iterate over all the entries, use _cells. If you want all the rows for a given column use _columns. If you want all the columns in a given row use _rows.

If you want to iterate in sorted order, you can start to add LINQ into the mix and/or use a sorted list with an inner class that encapsulates an entry (which would have to store the row or column and implement IComparable<T> for sorting to work).

Best way to store a sparse matrix in .NET

6 Answers