0
votes

I'm trying to make a 2^n x 2^n numpy array of all possible dot product permutations of a very large set of vectors. My test array, "data", is a (129L, 222L) numpy array. My function seems (in my novice opinion) to be pretty straightforward. It's just the fact that I have too much data to process. How do programmers typically get around this issue? Any suggestions?

My data:

>>> data
array([[  1.36339199e-07,   6.71355407e-09,   2.13336419e-07, ...,
          8.44471296e-10,   6.02566662e-10,   3.38577178e-10],
       [  7.19224620e-08,   5.64739121e-08,   1.49689547e-07, ...,
          3.85361972e-10,   3.17756751e-10,   1.68563023e-10],
       [  1.93443482e-10,   1.11626853e-08,   2.66691759e-09, ...,
          2.20938084e-11,   2.56114420e-11,   1.31865060e-11],
       ..., 
       [  7.12584509e-13,   7.70844451e-13,   1.09718565e-12, ...,
          2.08390730e-13,   3.05264153e-13,   1.62286818e-13],
       [  2.57153616e-13,   6.08747557e-13,   2.00768488e-12, ...,
          6.29901984e-13,   1.19631816e-14,   1.05109078e-13],
       [  1.74618064e-13,   5.03695393e-13,   1.29632351e-14, ...,
          7.60145676e-13,   3.19648911e-14,   8.72102078e-15]])`

My function:

import numpy as np
from itertools import product, count

def myFunction(data):
    S = np.array([])
    num = 2**len(data)
    y = product(data, repeat = 2)
    for x in count():
        while x <= num:
            z = y.next()
            i, j = z
            s = np.dot(i, j)
            S = np.insert(S, x, s)
            break #for the 'StopIteration' issue
        return np.reshape(S, (num,num))

My error:

>>> theMatrix = myFunction(data)

Traceback (most recent call last):

File "C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line 2721, in run_code exec code_obj in self.user_global_ns, self.user_ns

File "", line 1, in <module> matrix = myFunction(data)

File "E:\Folder1\Folder2\src\myFunction.py", line 16, in myFunction return np.reshape(S, (num,num))

File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 171, in reshape return reshape(newshape, order=order)

ValueError: Maximum allowed dimension exceeded

2

2 Answers

2
votes

The cartesian product is O(n^2) not O(2^n), (lucky for you). Probably that's also the cause of your "StopIteration" issue

S = np.array([])
num = len(data) ** 2  # This is not the same as 2 ** len(data) !!
y = product(data, repeat=2)
for x in count():
    while x <= num:
        z = y.next()
        i, j = z
        s = np.dot(i, j)
        S = np.insert(S, x, s)
        break #for the 'StopIteration' issue
    return np.reshape(S, (num, num))
3
votes

Why are you passing num,num to reshape, but not the actual thing you're reshaping?

Perhaps you want something like return np.reshape(S, (num, num)) instead?


As for the actual error, 2^129 is a pretty darn large number - even your regular 64-bit integer can only index up to 2^64. The memory of your machine probably can't contain a 2^129 x 2^129 matrix.

Are you sure you really want to be processing quite that much? Even with a GHz processor, that's still ~2^100 seconds worth of processing if you can operate on an element in a single cpu cycle (which you probably can't).