I am afraid there is no short answer. The best storage scheme depends on the problem you are trying to solve. The things to consider are not only the storage size, but also how efficient, from a computational and hardware perspective, access and operations on this storage format are.
For sparse matrix vector multiplication CSR is a good format as it allows linear access to the elements of a matrix row which is good for memory and cache performance. However CSR induces a more irregular access pattern into the multiplicand: fetch elements at different positions, depending on what index you retrieve from the row; this is bad for cache performance. A CSC matrix vector multiplication can remove the irregular access on the multiplicand, at the cost of more irregular access in the solution vector. Depending on your matrix structure you may choose one or another. For example a matrix with a few, long rows, with a similar nonzero distribution may be more efficient to handle in a CSC format.
Some examples in the well known software packages/tools:
- To the best of my knowledge Matlab uses a column storage by default.
Scientific codes (and BLAS) based on Fortran also use a column storage by default. This is due mostly to historical reasons since Fortran arrays were AFAIK column oriented to begin with and a large number of Dense/Sparse BLAS codes were originally written in Fortran.
Eigen also uses a column storage by default, but this can be customised.
Intel MKL requires you to choose IIRC.
Boost ublas uses a row based storage format by default.
PetSC, which is a widely used tool in larger scale scientific computing, uses a row based format (SequentialAIJ stands for CSR), but also allows you to choose from a wide variety of storage formats (see the MatCreate* functions on their documentation)
And the list could go on. As you can see there is some spread between the various tools, and I doubt the criteria was the performance of the SpMV operation :) Probably aspects such as the common storage formats in the target problem domains, typical expectations of programmers in the target problem domain, integration with other library aspects and already existing codes have been the prime reason behind using CSR / CSC. These differ on a per tool basis, obviously.
Anyhow, a short overview on sparse storage formats can be found here but many more storage formats were/are being proposed in sparse matrix research:
- There are also block storage formats, which attempt to leverage locally dense substructures of the matrix. See for example "Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure" by Richard W. Vuduc, Hyun-Jin Moon.
- A very brief but useful overview of some common storage formats can be found on the Python
scipy
documentation on sparse formats http://docs.scipy.org/doc/scipy/reference/sparse.html
.
- Further information of the advantages of various formats can be found in the following texts (and many others):
- Iterative methods for sparse linear systems, Yousef Saad
- SPARSKIT: A basic tool kit for sparse matrix computation, Tech. Rep. CSRD TR 1029, CSRD, University of Illinois, Urbana, IL, 1990.
- LAPACK working note 50: Distributed sparse data structures for linear algebra operations, Tech. Rep. CS 92-169, Computer Science Department, University of Tennessee, Knoxville, TN, 1992.
I have been doing research in the sparse matrix area on creating custom hardware architectures for sparse matrix algorithms (such as SpMV). From experience, some sparse matrix benchmarks tend to ignore the overhead of conversion between various formats. This is because, in principle, it can be assumed that you could just adapt the storage format of your entire algorithm. SpMV itself is hardly used in isolation, and generally a part of some larger iterative algorithm (e.g. a linear or nonlinear solver). In this case, the cost of converting between formats can be amortised across the many iterations and total runtime of the entire algorithm. Of course you would have to justify that your assumption holds in this situation.
As a disclaimer, in my area we are particularly inclined to make as many assumptions as possible, since the cost and time to implement a hardware architecture for a linear solver to benchmark a new SpMV storage format is usually substantial (on the order of months). When working in software, it is a lot easier to test, qualify and quantify your assumptions by running as many benchmarks as possible, which would probably take less than a week to set up :D