So, is parsing the file line by line an option, like:
from scipy.sparse import coo_matrix
rows, cols, values = [], [], []
with open('sparse.txt') as f:
for i, line in enumerate(f):
for cell in line.strip().split(' '):
col, value = cell.split(':')
rows.append(i)
cols.append(int(col))
values.append(int(value))
matrix = coo_matrix((values, (rows, cols)))
print matrix.todense()
Or do you need a faster one-step implementation? Not sure if this is possible.
Edit #1:
You can avoid one iteration splitting each line in one step using regular expressions leading to the following alternative implementation:
import numpy as np
from scipy.sparse import coo_matrix
import re
rows, cols, values = [], [], []
with open('sparse.txt') as f:
for i, line in enumerate(f):
numbers = map(int, re.split(':| ', line))
rows.append([i] * (len(numbers) / 2))
cols.append(numbers[::2])
values.append(numbers[1::2])
matrix = coo_matrix((np.array(values).flatten(),
(np.array(rows).flatten(),
np.array(cols).flatten())))
print matrix.todense()
Edit #2:
I found an even shorter solution without explicit loop:
from scipy.sparse import coo_matrix, vstack
def parseLine(line):
nums = map(int, line.split(' '))
return coo_matrix((nums[1::2], ([0] * len(nums[0::2]), nums[0::2])), (1, 4))
with open('sparse.txt') as f:
lines = f.read().replace(':', ' ').split('\n')
cols = max(map(int, " ".join(lines).split(" "))[::2])
M = vstack(map(parseLine, lines))
print M.todense()
The loop is hidden within the map
commands that act on lines
. I think there is no solution without loops at all, since most built-in functions use them and many string-parsing methods like re.finditer
yield iterators only.