I'd like to train a neural network in Python and Keras using a metric learning custom loss function. The loss minimizes the distances of the outputs for similar inputs and maximizes the distances between dissimilar ones. The part considering similar inputs is:
# function to create a pairwise similarity matrix, i.e
# L[i,j] == 1 for similar samples i, j and 0 otherwise
def build_indicator_matrix(y_, thr=0.1):
# y_: contains the labels of the samples,
# samples are similar in case of same label
# prevent checking equality of floats --> check if absolute
# differences are below threshold
lbls_diff = K.expand_dims(y_, axis=0) - K.expand_dims(y_, axis=1)
lbls_thr = K.less(K.abs(lbls_diff), thr)
# cast bool tensor back to float32
L = K.cast(lbls_thr, 'float32')
# POSSIBLE WORKAROUND
#L = K.sum(L, axis=2)
return L
# function to compute the (squared) Euclidean distances between all pairs
# of samples, store in DIST[i,j] the distance between output y_pred[i,:] and y_pred[j,:]
def compute_pairwise_distances(y_pred):
DIFF = K.expand_dims(y_pred, axis=0) - K.expand_dims(y_pred, axis=1)
DIST = K.sum(K.square(DIFF), axis=-1)
return DIST
# function to compute the average distance between all similar samples
def my_loss(y_true, y_pred):
# y_true: contains true labels of the samples
# y_pred: contains network outputs
L = build_indicator_matrix(y_true)
DIST = compute_pairwise_distances(y_pred)
return K.mean(DIST * L, axis=1)
For training, I pass a numpy array y
of shape (n,)
as target variable to my_loss
. However, I found (using the computational graph in TensorBoard) that the tensorflow backend creates a 2D variable out of y
(displayed shape ? x ?
), and hence L
in build_indicator_matrix
is not 2 but 3-dimensional (shape ? x ? x ?
in TensorBoard). This causes net.evaulate()
and net.fit()
to compute wrong results.
Why does tensorflow create a 2D rather than a 1D array? And how does this affect net.evaluate()
and net.fit()
?
As quick workarounds I found that either replacing the build_indicator_matrix()
with static numpy code for computing L
, or collapsing the "fake" dimension with the line L = K.sum(L, axis=2)
solves the problem. In the latter case, however, the output of K.eval(build_indicator_matrix(y))
is of only of shape (n,)
and not (n,n)
, so I do not understand why this workaround still yields correct results. Why does tensorflow introduce an additional dimension?
My library versions are:
- keras: 2.2.4
- tensorflow: 1.8.0
- numpy: 1.15.0