3
votes

How to convert an edge list (data) to a python scipy sparse matrix to get this result:

sparse matrix in R

Dataset (where 'agn' is node category one and 'fct' is node category two):

data['agn'].tolist()
['p1', 'p1', 'p1', 'p1', 'p1', 'p2', 'p2', 'p2', 'p2', 'p3', 'p3', 'p3', 'p4', 'p4', 'p5']

data['fct'].tolist()
['f1', 'f2', 'f3', 'f4', 'f5', 'f3', 'f4', 'f5', 'f6', 'f5', 'f6', 'f7', 'f7', 'f8', 'f9']

(not working) python code:

from scipy.sparse import csr_matrix, coo_matrix

csr_matrix((data_sub['agn'].values, data['fct'].values), 
                    shape=(len(set(data['agn'].values)), len(set(data_sub['fct'].values))))

-> Error: "TypeError: invalid input format" Do I really need three arrays to construct the matrix, like the examples in the scipy csr documentation do suggest (can only use two links, sorry!)?

(working) R code used to construct the matrix with only two vectors:

library(Matrix)

grph_tim <- sparseMatrix(i = as.numeric(data$agn), 
                     j = as.numeric(data$fct),  
                     dims = c(length(levels(data$agn)),
                              length(levels(data$fct))),
                     dimnames = list(levels(data$agn),
                                     levels(data$fct)))

EDIT: It finally worked after I modified the code from here and added the needed array:

import numpy as np
import pandas as pd
import scipy.sparse as ss

def read_data_file_as_coo_matrix(filename='edges.txt'):
    "Read data file and return sparse matrix in coordinate format."

    # if the nodes are integers, use 'dtype = np.uint32'
    data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')

    # where 'rows' is node category one and 'cols' node category 2
    rows = data['agn']  # Not a copy, just a reference.
    cols = data['fct']

    # crucial third array in python, which can be left out in r
    ones = np.ones(len(rows), np.uint32)
    matrix = ss.coo_matrix((ones, (rows, cols)))
    return matrix

Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].

I get this sparse matrix:

(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1

1
It's a bit unclear what you want : what are the coefficients and what are the indices ? From the picture of the data (you shouldn't put a picture btw, just copy paste the data as text so that we can use it) we see indices that are p1,f1 etc these are not integers (as far as I can tell) so they can't be used directly as indices.jadsq
Even if I use numeric values instead, I still get the same error! data_a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] data_b = [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8]. mtx = csr_matrix((data_a, data_b), shape=(len(set(data_a)), len(set(data_b))))gaussit
Yes, the scipy sparse needs the data as well as the rows and cols arrays. It does not assume that the data values are all 1. The original sparse matrix code was used for linear algebra problems, where the data is floats.hpaulj

1 Answers

0
votes

It finally worked after I modified the code from here and added the needed array:

import numpy as np
import pandas as pd
import scipy.sparse as ss

def read_data_file_as_coo_matrix(filename='edges.txt'):
    "Read data file and return sparse matrix in coordinate format."

    # if the nodes are integers, use 'dtype = np.uint32'
    data = pd.read_csv(filename, sep = '\t', encoding = 'utf-8')

    # where 'rows' is node category one and 'cols' node category 2
    rows = data['agn']  # Not a copy, just a reference.
    cols = data['fct']

    # crucial third array in python, which can be left out in r
    ones = np.ones(len(rows), np.uint32)
    matrix = ss.coo_matrix((ones, (rows, cols)))
    return matrix

Additionally, I converted the string names of the nodes to integers. Thus data['agn'] becomes [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4] and data['fct'] becomes [0, 1, 2, 3, 4, 2, 3, 4, 5, 4, 5, 6, 6, 7, 8].

I get this sparse matrix:

(0, 0) 1 (0, 1) 1 (0, 2) 1 (0, 3) 1 (0, 4) 1 (1, 2) 1 (1, 3) 1 (1, 4) 1 (1, 5) 1 (2, 4) 1 (2, 5) 1 (2, 6) 1 (3, 6) 1 (3, 7) 1 (4, 8) 1