I have a sparse.txt that looks like this:
# first column is label 0 or 1
# rest of the data is sparse data
# maximum value in the data is 4, so the future dense matrix will
# have 1+4 = 5 elements in a row
# file: sparse.txt
1 1:1 2:1 3:1
0 1:1 4:1
1 2:1 3:1 4:1
The required dense.txt is this:
# required file: dense.txt
1 1 1 1 0
0 1 0 0 1
1 0 1 1 1
Without using scipy coo_matrix it did it in a simple way like this:
def create_dense(fsparse, fdense,fvocab):
# number of lines in vocab
lvocab = sum(1 for line in open(fvocab))
# create dense file
with open(fsparse) as fi, open(fdense,'w') as fo:
for i, line in enumerate(fi):
words = line.strip('\n').split(':')
words = " ".join(words).split()
label = int(words[0])
indices = [int(w) for (i,w) in enumerate(words) if int(i)%2]
row = [0]* (lvocab+1)
row[0] = label
# use listcomps
row = [ 1 if i in indices else row[i] for i in range(len(row))]
l = " ".join(map(str,row)) + "\n"
fo.write(l)
print('Writing dense matrix line: ', i+1)
Question How can we directly get label and data from sparse data without first creating dense matrix and using NUMPY /Scipy preferably??
Question: How can we read the sparse data using numpy.fromregex ?
My attempt is:
def read_file(fsparse):
regex = r'([0-1]\s)([0-9]):(1\s)*([0-9]:1)' + r'\s*\n'
data = np.fromregex(fsparse,regex,dtype=str)
print(data,file=open('dense.txt','w'))
It did not work!
Related links:
row
in a list? That would be a list of lists (of numbers), right? Can you make the array directly from that? – hpaulj