I have a Table in MySQL with three columns: row-index, column-index and value, which I want to read in into a scipy csr_matrix. I use the Python-MySQL connector. There are 112,500 non-zero elements.
Try 1:
A = csr_matrix((N_rows, N_cols), dtype=float)
show = 'SELECT * FROM my_table'
cursor.execute(show)
for (row, col, value) in cursor:
A[row, col] = value
This is too slow, I had to stop it after 60 seconds. It mentioned an efficiency warning, and suggested to use lil matrices.
Try 2:
A = lil_matrix((N_rows, N_cols), dtype=float)
show = 'SELECT * FROM my_table'
cursor.execute(show)
for (row, col, value) in cursor:
A[row, col] = value
A = csr_matrix(A)
This takes 6.4 seconds (average of three). Is this as good as it gets, or is there a quicker way in which I can create the csr_matrix without going through a loop? If I perform cursor.fetchall(), the data looks like:
[(row_0, col_0, value_0), (row_1, col_1, value_1), ...]
This cannot be used for the csr_matrix constructor.