1
votes

\ am dealing with a matrix in MATLAB which is sparse and has many rows and columns. In this case, the row and columns of the matrix are the ids for particular items. Let's assume them as id1 and id2.

It would be nice if the ids for rows and columns could be embedded so I can have access to them easily to them without the need for creating extra variables that keep the two ids.

The answer would be probably to use a table data type. Tables are very ideal answer for my need however I was wondering if I could create a table data type for a sparse matrix?

A  [m*n] sparse matrix    %% m & n are huge 
id1 [1*m] , id2 [1*n]     %% two vectors containing numeric ids for rows and column

Could we obtain?

T  [m*n] sparse table matrix

Thanks for sharing your view with me.

1
Check the documentation for the MATLAB sparse function: mathworks.com/help/matlab/ref/sparse.html?refresh=true - DMR
There is no sparse table class in Matlab. First reason is that table() can have variables with more than one column, how would you define sparsity in that case (rethoric question)? What is the table specific functionality that you want to retain as opposed to a sparse() matrix? - Oleg
Thanks, I did that before posting. table(sparse(rand(10,10))) as an example, makes the table non sparse which is not something I am looking for. - Yas
Oleg: Th requirements are as below as mentioned in my question: 1. A is a huge matrix with many zeros entries. It is preferred to be in sparse form. 2. Rows and columns of A have unique ids, it would be nice if rows and columns of A have identifiers like how table data type does it in Matlab. - Yas
It seems to me, correct me if I am wrong, that what you want is to have a nice display in the VariablesEditor. Consider however, that tables store variables in a very different way, and the added value of having an excel like display, with row and column labels quickly loses its benefits, especially with your dimensions. Morevoer, tables do not allow to reshape their dimension (try a transpose), or to query data with linear indices. - Oleg

1 Answers

2
votes

I will address the question and the comments in order to clear some confusion.

The short answer

There is no sparse table class in Matlab. Cannot do. Use sparse() matrices.

The long answer

There is a reason why sparse tables make little sense:

  1. Philosophically speaking, the advantage of having nice row and column labels, is completely lost if you are working with a big panel of data and/or if the data is sparse.

    Scrolling through 246829 rows and 33336 columns? Can only be useful at very isolated times if you are debugging your code and a specific outlier is causing you results to go off. Also, all you might see is just a sea of zeros.

  2. Technically a table can have more columns for the same variable, i.e. table(rand(10,2), rand(10,1)) is a valid table. How would you consider define sparsity on such table?

    Fine, suppose you are working with a matrix-like table, i.e. one element per table cell and same numeric class. Still, none of the algebraic operators are defined on a table(). So you need to extract the content first, in order to be able to perform any operation that spans more than a single column of data. Just to be clear, once the data is extracted, then you have e.g. your double (full) matrix or in an ideal case a double sparse matrix.

Now, a few misconceptions to clear:

  • Less variables implies clearer/cleaner code. Not true. You are probably thinking about the extreme case (in bad practices) of how do I make a series of variables a1, a2, a3, etc..

    There is a sweet spot between verbosity and number of variables, amount of comments, and code clarity/maintainability. Only with time and experience you find the right balance.

  • Control over data cannot go without visual inspection. This approach does NOT scale with big data and the sooner you abandon it, the faster your code will become more reliable. You need to verify your results systematically, rather than relying on visual inspection. Failure to (visually) spot a problem in the data, grows exponentially with its dimension, faster than with systematic tests.

Some background info on my work:

I work with high-frequency prices, that's terabytes of data. I also extended the table() class with additional methods and fixes to help me with my work (see https://github.com/okomarov/tableutils), but I do not see how sparsity is a useful feature to add to table().