4
votes

I’m working with sympy on a symbolic jacobian matrix J of size QxQ. Each coefficient of this matrix contains Q symbols, from f[0] to f[Q-1]. What I’d like to do is to substitute every symbol in every coefficient of J with known values g[0] to g[Q-1] (which are no more symbols). The fastest way I found to do it is as follows:

for k in range(Q):
    J = J.subs(f[k], g[k])

However, I find this "basic" operation very long! For example, with this MCVE:

import sympy
import numpy as np
import time

Q = 17
f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16 = \
    sympy.symbols("f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16")
f = [f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16]
e = np.array([0., 1., 0., -1., 0., 1., -1., -1., 1.,
              2., -2., -2., 2., 3., 0., -3., 0.])
u = np.sum(f * e) / np.sum(f)
function = e * np.sum(f) * (1. + u * e + (u * e)**2 - u * u)
F = sympy.Matrix(function)
g = e * (1. + 0.2 * e + (0.2 * e)**2)

start_time = time.time()
J = F.jacobian(f)
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()
for k in range(Q):
    J = J.subs(f[k], g[k])
print("--- %s seconds ---" % (time.time() - start_time))

the substitution takes about 5s on my computer, while the computation of the jacobian matrix takes only 0.6s. On another (longer) code, the substitution takes 360s with Q=37 (while 20s for the jacobian computation)!

Moreover, when I look at my running processes, I can see that the Python process sometimes stops working during the matrix substitution.

  1. Does anyone know where this might come from?
  2. Is there a way to make this operation faster?
1
Welcome on SO, do not forget to take the tour :) Regarding the question, I suspect that the Jacobian might not be simplified leading to a lot of repeated variables/computations. - tupui

1 Answers

4
votes

You might want to try Theano for that. It implements a jacobian function which is parallel and faster than sympy.

The following version achieves a speedup of 3.88! Now the substitution time is inferior to the second.

import numpy as np
import sympy as sp
import theano as th
import time


def main_sympy():
    start_time = time.time()

    Q = 17
    f = sp.symbols(('f{} ' * Q).format(*range(Q)))

    e = np.array([0., 1., 0., -1., 0., 1., -1., -1., 1.,
                  2., -2., -2., 2., 3., 0., -3., 0.])
    u = np.sum(f * e) / np.sum(f)
    ue = u * e
    phi = e * np.sum(f) * (1. + ue + ue*ue - u*u)
    F = sp.Matrix(phi)
    J = F.jacobian(f)

    g = e * (1. + 0.2*e + (0.2*e)**2)

    for k in range(Q):
        J = J.subs(f[k], g[k])

    print("--- %s seconds ---" % (time.time() - start_time))
    return J


def main_theano():
    start_time = time.time()

    Q = 17
    f = th.tensor.dvector('f')

    e = np.array([0., 1., 0., -1., 0., 1., -1., -1., 1., 2.,
                  -2., -2., 2., 3., 0., -3., 0.])
    u = (f * e).sum() / f.sum()
    ue = u * e
    phi = e * f.sum() * (1. + ue + ue*ue - u*u)
    jacobi = th.gradient.jacobian(phi, f)
    J = th.function([f], jacobi)

    g = e * (1. + 0.2*e + (0.2*e)**2)
    Jg = J(g)  # evaluate expression

    print("--- %s seconds ---" % (time.time() - start_time))
    return Jg


J0 = np.array(main_sympy().tolist(), dtype='float64')
J1 = main_theano()

print(np.allclose(J0, J1))  # compare results