Is it valid to train the autoencoder before building the encoder/decoder models?

Question

I am following the tutorial https://blog.keras.io/building-autoencoders-in-keras.html to build my autoencoder. For that, I have two strategies:

A) Step 1: build autoencoder; Step 2: build encoder; Step 3: build decoder; Step 4: compile autoencoder; Step 5: train autoencoder.

B) Step 1: build autoencoder; Step 2: compile autoencoder; Step 3: train autoencoder; Step 4: build encoder; Step 5: build decoder.

For both cases the model converges to loss 0.100. However, in case of strategy A, which is the one stated in the tutorial, the reconstruction is very poor. In case of strategy B the reconstruction is much better.

In my opinion this makes sense because in strategy A the weights of the encoder and decoder models were built over untrained layers and the result is random. In strategy B, on the other hand, I have the weights better defined after training, hence the reconstruction is better.

My questions are, is strategy B valid or I am cheating on the reconstruction? In case of strategy A, is Keras supposed to update the weights of the encoder and decoder models automatically since their models were built based on the autoencoder layers?

###### Code for Strategy A

# Step 1
features = Input(shape=(x_train.shape[1],))

encoded = Dense(1426, activation='relu')(features)
encoded = Dense(732, activation='relu')(encoded)
encoded = Dense(328, activation='relu')(encoded)

encoded = Dense(encoding_dim, activation='relu')(encoded)

decoded = Dense(328, activation='relu')(encoded)
decoded = Dense(732, activation='relu')(decoded)
decoded = Dense(1426, activation='relu')(decoded)
decoded = Dense(x_train.shape[1], activation='relu')(decoded)

autoencoder = Model(inputs=features, outputs=decoded)

# Step 2
encoder = Model(features, encoded)

# Step 3
encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-4](encoded_input)
decoder_layer = autoencoder.layers[-3](decoder_layer)
decoder_layer = autoencoder.layers[-2](decoder_layer)
decoder_layer = autoencoder.layers[-1](decoder_layer)

decoder = Model(encoded_input, decoder_layer)

# Step 4
autoencoder.compile(optimizer='adam', loss='mse')

# Step 5
history = autoencoder.fit(x_train, 
                         x_train,
                         epochs=150,
                         batch_size=256,
                         shuffle=True,
                         verbose=1,
                         validation_split=0.2)

# Testing encoding
encoded_fts = encoder.predict(x_test)
decoded_fts = decoder.predict(encoded_fts)

###### Code for Strategy B

# Step 1
features = Input(shape=(x_train.shape[1],))

encoded = Dense(1426, activation='relu')(features)
encoded = Dense(732, activation='relu')(encoded)
encoded = Dense(328, activation='relu')(encoded)

encoded = Dense(encoding_dim, activation='relu')(encoded)

decoded = Dense(328, activation='relu')(encoded)
decoded = Dense(732, activation='relu')(decoded)
decoded = Dense(1426, activation='relu')(decoded)
decoded = Dense(x_train.shape[1], activation='relu')(decoded)

autoencoder = Model(inputs=features, outputs=decoded)

# Step 2
autoencoder.compile(optimizer='adam', loss='mse')

# Step 3
history = autoencoder.fit(x_train, 
                         x_train,
                         epochs=150,
                         batch_size=256,
                         shuffle=True,
                         verbose=1,
                         validation_split=0.2)
# Step 4
encoder = Model(features, encoded)

# Step 5
encoded_input = Input(shape=(encoding_dim,))
decoder_layer = autoencoder.layers[-4](encoded_input)
decoder_layer = autoencoder.layers[-3](decoder_layer)
decoder_layer = autoencoder.layers[-2](decoder_layer)
decoder_layer = autoencoder.layers[-1](decoder_layer)

decoder = Model(encoded_input, decoder_layer)

# Testing encoding
encoded_fts = encoder.predict(x_test)
decoded_fts = decoder.predict(encoded_fts)

Not a programming question, hence arguably off-topic here; questions on ML theory & methodology should be posted at Cross Validated. — desertnaut
Thanks! I posted my question there also. It is related to both theory and coding since it depends on how Keras constructs its models and on how I code my layers. So I am keeping it on both communities ;) — Damares Oliveira
I am afraid you can't do that - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? Please pick one and delete the other (personally, I don't see anything Keras-specific here, so I still think you should delete this one). — desertnaut
Hmm ok. I deleted the question on Cross Validated. Thanks desertnaut! — Damares Oliveira

Mikhail Stepanov Mikhail Stepanov · Accepted Answer · 2019-03-23T15:20:43

My questions are, is strategy B valid or I am cheating on the reconstruction?

A and B are equivalent; no, you didn't cheat.

In case of strategy A, is Keras supposed to update the weights of the encoder and decoder models automatically since their models were built based on the autoencoder layers?

Decoder model just uses autoencoder layers. In case A:

decoder.layers
Out:
[<keras.engine.input_layer.InputLayer at 0x7f8a44d805c0>,
 <keras.layers.core.Dense at 0x7f8a44e58400>,
 <keras.layers.core.Dense at 0x7f8a44e746d8>,
 <keras.layers.core.Dense at 0x7f8a44e14940>,
 <keras.layers.core.Dense at 0x7f8a44e2dba8>]

autoencoder.layers
Out:[<keras.engine.input_layer.InputLayer at 0x7f8a44e91c18>,
 <keras.layers.core.Dense at 0x7f8a44e91c50>,
 <keras.layers.core.Dense at 0x7f8a44e91ef0>,
 <keras.layers.core.Dense at 0x7f8a44e89080>,
 <keras.layers.core.Dense at 0x7f8a44e89da0>,
 <keras.layers.core.Dense at 0x7f8a44e58400>,
 <keras.layers.core.Dense at 0x7f8a44e746d8>,
 <keras.layers.core.Dense at 0x7f8a44e14940>,
 <keras.layers.core.Dense at 0x7f8a44e2dba8>]

hex numbers (object id's) for 4 last lines of each list are just the same - because it's the same objects. Of course, they share their weights, too.

In case B:

decoder.layers
Out:
[<keras.engine.input_layer.InputLayer at 0x7f8a41de05f8>,
 <keras.layers.core.Dense at 0x7f8a41ee4828>,
 <keras.layers.core.Dense at 0x7f8a41eaceb8>,
 <keras.layers.core.Dense at 0x7f8a41e50ac8>,
 <keras.layers.core.Dense at 0x7f8a41e5d780>]

autoencoder.layers
Out:
[<keras.engine.input_layer.InputLayer at 0x7f8a41da3940>,
 <keras.layers.core.Dense at 0x7f8a41da3978>,
 <keras.layers.core.Dense at 0x7f8a41da3a90>,
 <keras.layers.core.Dense at 0x7f8a41da3b70>,
 <keras.layers.core.Dense at 0x7f8a44720cf8>,
 <keras.layers.core.Dense at 0x7f8a41ee4828>,
 <keras.layers.core.Dense at 0x7f8a41eaceb8>,
 <keras.layers.core.Dense at 0x7f8a41e50ac8>,
 <keras.layers.core.Dense at 0x7f8a41e5d780>]

- layers are the same, to.

So, orders of training for A and B are equivalent. More general, if you share the layers (and hence the weights), that order of building, compiling and training doesn't matter in most cases, because they are in the same tensorflow graph.

I run this examples on mnist dataset, they show the same performance and reconstruct images well. I suppose, if you're in trouble with case A, you've missed other thing (idk how, because I copy-paste your code and everything is OK).

If you use jupyter, sometimes restart and run top-to-bottom helps.

Is it valid to train the autoencoder before building the encoder/decoder models?

1 Answers