2
votes

I have the following code that overwrites matrix rows and it takes a considerable time for large matrices. Basically, I need the rows where there is a value of i to contain that value, while the rest should remain zeros

Any suggestions on how to perform the same operation in a more efficient way?

matr = np.zeros((178858, 400))

for key, index in values.items():
    vect = get_vector(key)
    matr[index] = vect

get_vector returns a vector of length 400, given a key.

values is a dictionary containing a key (string) and an index (integer) for that key.

1
what does values look like (don't post the whole thing, just a few pairs)? - FHTMitchell
@FHTMitchell dictionary like this: {'weekday': 1033, 'weekdays': 123156, 'weekend': 776, 'weekendat': 156361, 'weekender': 49772, 'weekends': 59230, 'weekes': 56379, 'weeki': 92312, 'weekley': 59795, 'weeklong': 18939, 'weekly': 1932, 'weeknd': 13483, 'weeknight': 23431, 'weekquick': 116531, 'weeks': 19966, 'weekslong': 65883, 'weeldreyer': 136120, 'weemee': 62687, 'weemees': 96805, 'weems': 47923, 'ween': 110761, 'weenie': 73326, 'weenies': 112514, 'weensy': 174000, 'weeny': 55138, 'weep': 9058, 'weepiness': 136959}. {key:index-for-matrix} - Alex Popa
Ah I see. What is get_vector then? - FHTMitchell
get_vector is a function that returns a vector of length 400, given one of the keys above. For example, get_vector('weekday') could return np.array([9, 2, 5, 7, 123, 5...]). How get_vector works is not relevant for the question. If you want to test it, you can put a random number generator there. - Alex Popa
That's ok, I just wanted its return type. My answer should work. Does it run any quicker for you? - FHTMitchell

1 Answers

0
votes

A possible approach is to use advanced numpy indexing which is maybe faster (values must be ordered so wont work on python 3.0 - 3.5)

indices = np.fromiter(values.values(), dtype=int, count=len(values))
keys = np.array([get_vector(v) for v in values.keys()])

matr[indices] = keys