I have written a Variational Auto-Encoder in Keras using Tensorflow as backend. As optimizer I use Adam, with a learning rate of 1e-4 and batch size 16. When I train the net on my Macbook's CPU (Intel Core i7), the loss value after one epoch (~5000 minibatches) is a factor 2 smaller than after the first epoch on a different machine running Ubuntu. For the other machine I get the same result on both CPU and GPU (Intel Xeon E5-1630 and Nvidia GeForce GTX 1080). Python and the libraries I'm using have the same version number. Both machines use 32 bit floating points. If I use a different optimizer (eg rmsprop), the significant difference between machines is still there. I'm setting np.random.seed to eliminate randomness.
My net outputs logits (I have linear activation in the output layer), and the loss function is tf.nn.sigmoid_cross_entropy_with_logits. On top of that, one layer has a regularizer (the KL divergence between its activation, which are params of a Gaussian distribution, and a zero mean Gauss).
What could be the cause of the major difference in loss value?