5
votes

I need to execute a theano function a number of times via scan in order to sum-up a cost function and use it in a gradient computation. I'm familiar with the deep-learning tutorials that do this but my data slicing and some other complications means I need to do this a little different. Below is a much simplified version of what I'm trying to do..

tn = testnet()
cost = tn.single_cost( )
x = theano.shared(numpy.asarray([7.1,2.2,3.4], dtype='float32'))
index = T.lscalar('index')
test_fn = theano.function(inputs=[index], outputs=cost, 
    givens={tn.x:x[index:index+1]} )

def step(curr):
    return T.constant( test_fn( curr ) )
outs,_ = theano.scan(step, T.arange(2))

out_fn = theano.function(inputs=[], outputs=outs)
print out_fn()

In the scan function, the call to test_fn(curr) is giving the error... Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

Even if I pass in an array of values instead of putting the T.arrange(2) in place, I still get the same error. Is there a reason you can't call a function from scan?

In general I'm wondering if there is a way to call a function like this with a series of indexes so that the output can feed into a T.grad() computation (not shown).

3

3 Answers

3
votes

Don't make two different theano.functions.

A theano.function takes a symbolic relationship, optimizes it, and compiles it. What you are doing here is asking theano.scan (and thus out_fn) to consider a compiled function as a symbolic relationship. Whether you could technically get that to work I'm not sure, but it goes against the idea of Theano.

Since I don't know what your cost function does here I can't give an exact example, but here's a quick example which does work and should be similar enough to what I think you're trying to do.

x = theano.shared(np.asarray([7.1,2.2,3.4], dtype = np.float32))

v = T.vector("v")
def fv(v):
    res,_ = theano.scan(lambda x: x ** 2, v)
    return T.sum(res)

def f(i):
    return fv(x[i:i+2])

outs,_ = theano.scan(
    f, 
    T.arange(2)
    )

fn = theano.function(
    [],
    outs,
    )

fn()
1
votes

After some investigation I agree that calling a function from a function is not correct. The challenge with the code is that following the basic design of the deep-learning tutorials, the first layer of the net has a symbolic variable defined as it's input and the output is propagated up to higher layers until a final cost is computed from the top layer. The tutorials uses code something like...

class layer1(object):
   def __init__(self):
      self.x = T.matrix()
      self.output = activation(T.dot(self.x,self.W) + self.b)

For me the tensor variable (layer1.self.x) needs to change every time scan takes a step to have a new slice of data. The "givens" statement in a function does that, but since calling a compiled theano function from inside a "scan" doesn't work there are two other solutions I was able to find...

1 - Rework the network so that its cost function is based on a series of function calls instead of a propagated variable. This is technically simple but requires a bit of re-coding to get things organized properly in a multi-layer network.

2 - Use theano.clone inside of scan. That code looks something like...

def step(curr):
    y_in = y[curr]
    replaces = {tn.layer1.x : x[curr:curr+1]}
    fn = theano.clone(tn.cost(y_in), replace=replaces)
    return fn
outs,_ = theano.scan(step, sequences=[T.arange(batch_start,batch_end)])

Both methods return the same results and appear execute at the same speed.

0
votes

Solution

The standard way is OpFromGraph (as of 0.8.2)

import theano as th
import theano.tensor as T

x = T.scalar('x')
y = T.scalar('y')
z = x+y
# unlike theano.function, must use list for outputs
op_add = th.OpFromGraph([x,y], [z])

def my_add(x_, y_):
    return op_add(x_, y_)[0]

x_list = T.vector('x_li')
x_sum = th.scan(op_add, sequences=[x_list], outputs_info=[T.constant(0.)])
fn_sum = th.function([x_list], x_sum)
fn([1., 2., 3., 4.]) # 10.

What it does?

OpFromGraph compiles a function defined from a graph, then pack it into a new Op. Just like defining functions in imperative programming languages.

Pros/Cons

  • [+] It can be handy in tricky models.
  • [+] It saves compilation time. You can compile a commonly used part of a big model into OpFromGraph, then directly use it in a bigger model. The final graph will have less nodes than direct implementation.
  • [-] It cause worse runtime performance. Calling a function has overhead, also the compiler is unable to do a global optimization due to its compiled nature.
  • [-] It's premature and still under development. The documentation for it is incomplete. Currently does not support updates and givens as in theano.function.

Notes

In most cases, you should be defining python functions/classes to build model. Only use OpFromGraph if no workaround is possible or you want to save compilation time.