I am wondering how much GPU computing would help me speed up my simulations.
The critical part of my code is matrix multiplication. Basically the code looks like the following python code with matrices of order 1000 and long for loops.
import numpy as np
m_size = 1000
sim_length = 50
a = np.random.rand(m_size, m_size)
b = np.random.rand(m_size, m_size)
for j in range(sim_length):
result = np.dot(a,b)
Note: My matrices are dense, mostly random and for loops are compiled with cython.
My naive guess would be that I have two factors:
- More parallel threads (Currently of order 1 thread, GPUs of order 100 threads?) --> Speedup of order 100? [Source is quite outdated, from 2011]
- Lower processor frequency (Currently 3Ghz, GPUs typically 2 Ghz) --> Neglect
I expect that this viewpoint is to naive, so what am I missing?