I wrote a Julia code which computes integrals over Gaussian functions and I have a sort-of kernel function which is called over and over again.
According to the Julia built-in Profile Module, this is where I spend most of the time during the actual computation and therefore I would like to see if there is any way in which I can improve it.
It is a recursive function and I implemented it in a kind of straightforward way. As I am not that much used to recursive functions, maybe somebody out there has some ideas/suggestions on how to improve it (both from a purely theoretical algorithmic point of view and/or exploiting special optimizations from the JIT compiler).
Here you have it:
"""Returns the integral of an Hermite Gaussian divided by the Coulomb operator."""
function Rtuv{T<:Real}(t::Int, u::Int, v::Int, n::Int, p::Real, RPC::Vector{T})
if t == u == v == 0
return (-2.0*p)^n * boys(n,p*norm(RPC)^2)
elseif u == v == 0
if t > 1
return (t-1)*Rtuv(t-2, u, v, n+1, p, RPC) +
RPC[1]*Rtuv(t-1, u, v, n+1, p, RPC)
else
return RPC[1]*Rtuv(t-1, u, v, n+1, p, RPC)
end
elseif v == 0
if u > 1
return (u-1)*Rtuv(t, u-2, v, n+1, p, RPC) +
RPC[2]*Rtuv(t, u-1, v, n+1, p, RPC)
else
return RPC[2]*Rtuv(t, u-1, v, n+1, p ,RPC)
end
else
if v > 1
return (v-1)*Rtuv(t, u, v-2, n+1, p, RPC)
RPC[3]*Rtuv(t, u, v-1, n+1, p, RPC)
else
return RPC[3]*Rtuv(t, u, v-1, n+1, p, RPC)
end
end
end
Don't pay that much attention to the function boys, since according to the profiler it is not that heavy.
Just to give an idea of the range of numbers: usually the first call comes from t+u+v ranging from 0 to 3, while n always starts at 0.
Cheers
EDIT -- New information
The generated version is slower for small values of
I was benchmarking badly for this case, without interpolating the argument passed. By doing it properly I am always faster with the approach explained in the accepted answer, so hurray!t,u,v, I believe the reason is because expressions are not optimzied by the compiler.
More generally, does the compiler identify trivial cases such as multiplication by zeros and ones and optimize those away?
Answer to myself: from a quick checking of simple code with @code_llvm it seems not to be the case.
Rtuv, the input variablespandRPCwill be different and every recursive call inside the function body will also be different sincet,u,v,nchange. On the other hand, if I have several initiating calls with the samepandRPC, then I think memoization might be useful, though I don't if it's worth the effort. - stefabatv > 1-- is there actually a nested recursion? And aret, u, vstatically known? In that case, macro expansion could help. - phipsgablerv > 1, I now edited the code. Yes there is a nested recursion, no,t, u, vare not statically known (if I understood what you meant..). In general, this function get called inside a triple-nested loop overt, u, v, where their value (oft, u, v) is always assumed non-negative (I forgot to mention that before) and generally the each loop goes from0to3or4. Regarding my previous comment in reply to @niczky12, actually memoization is quite easy to implement, I'm gonna give it a try and see if it improves. - stefabat