Recently I developed a new method. The new method works perfect with CUDA (at 20 to 40FPS) and I have already tested it successfully. The problem comes when I try to make a comparison with an old method. The old method was implemented on CPU. It does LU decomposition A=LU first, and then runs forward+back steps to solve (LU)x=b. The very nice thing about the old method is that A does not change, so LU decomposition can be done only once and the overhead is just the forward+backward solve. A is sparse and symmetric positive definite. (I believe this is a fairly common practice in many problems, e.g., fluid simulation in a fixed domain.)
To make my comparison fair, I want to implement the old method on GPU. But I didn't find any sparse LU decomposition in cuSolver or cuSparse. Am I supposed to calculate it by some other libraries? Shall I use cusolverRfSolve() for solve? If so, why L and U are not input, but P and Q are input to this function? Is there any working example similar to what I am trying to do?
Even if the old method runs slower on GPU, I would love to see it, which makes my new method really useful.