Why does TensorFlow use `None` as the default activation?

Question

In the TensorFlow Python API, the default value for the activation kwarg of tf.layers.dense is None, then in the documentation it says:

activation: Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

Why not just use the identity function as default value when defining the function? like this:

def dense(..., activation=lambda x: x, ...):
    pass

This way you don't have to worry about the inconsistency between the documentation and the code.

Is this (using None to represent a default function) just a coding style, or is there some caveat for using function as the default value of a kw argument?

It's not there to avoid unnecessary function calls, since an identity function is still created and called even None is passed to activation. Besides, since this happens at graph construction time, there is no point to do optimization like this - assuming this indeed is an optimization.

Correction:

As pointed out by @y-luo, the tf implementation doesn't actually create an identity function. But the tf.keras implementation does.

It is a common practice to use None as default argument and use something like if activation is None: activation = ... in the function body. The default arguments are evaluated when the function is defined (not each time the function is called), so all function calls get the same instance of the default argument. This leads to unexpected behavior when using mutable objects, for example, lists. It may be unnecessary in case of a lambda argument, but it still is a good practice. — pschill
@pschill I think even the default is a complicated function and is not defined by a lambda, it can still be used as the default value - unless you want to assign another function to that function name later, because there are no additional args are passed to the function identified by activation, that function can only accept one argument, so we don't have to worry about mutable argument in the function identified by activation, thus no matter you "call-it-when-None-encountered" or "use-it-as-the-default-value-then-call-it", they always refer to the same function, ... — Incömplete
... even that function has late-binding variables in it, they would still give the same result. But perhaps this is too much to think about, better just use None and forget about it. Thanks :) — Incömplete

Y. Luo Y. Luo · Accepted Answer · 2018-05-24T05:59:49

I don't think there is actually an identity function or any functions. For example:

class Dense(base.Layer):

  ...

  def call(self, inputs):
    ...
    if self.activation is not None:
      return self.activation(outputs)  # pylint: disable=not-callable
    return outputs

As you can see, None activation is actually correct because it serves as a condition rather than a real function. It is just equivalent to "linear" activation: a(x) = x

Why does TensorFlow use `None` as the default activation?

2 Answers