0
votes

In Keras' doc, there is an DAE (Denoising AutoEncoder) example. The following is the link https://keras.io/examples/mnist_denoising_autoencoder/

As we know, an autoencoder consists of an encoder and decoder network, and the output of the encoder is the input of the encoder. But when I examined the code over and again, I found that the input of the decoder (called latent) in the example is also the input of the encoder. It puzzles me a lot.

The following is the associated code segment

# Build the Autoencoder Model
# First build the Encoder Model
inputs = Input(shape=input_shape, name='encoder_input')
x = inputs
# Stack of Conv2D blocks
# Notes:
# 1) Use Batch Normalization before ReLU on deep networks
# 2) Use MaxPooling2D as alternative to strides>1
# - faster but not as good as strides>1
for filters in layer_filters:
    x = Conv2D(filters=filters,
               kernel_size=kernel_size,
               strides=2,
               activation='relu',
               padding='same')(x)

# Shape info needed to build Decoder Model
shape = K.int_shape(x)

# Generate the latent vector
x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)

# Instantiate Encoder Model
encoder = Model(inputs, latent, name='encoder')
encoder.summary()

# Build the Decoder Model
latent_inputs = Input(shape=(latent_dim,), name='decoder_input')
x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs)
x = Reshape((shape[1], shape[2], shape[3]))(x)
# Stack of Transposed Conv2D blocks
# Notes:
# 1) Use Batch Normalization before ReLU on deep networks
# 2) Use UpSampling2D as alternative to strides>1
# - faster but not as good as strides>1
for filters in layer_filters[::-1]:
    x = Conv2DTranspose(filters=filters,
                        kernel_size=kernel_size,
                        strides=2,
                        activation='relu',
                        padding='same')(x)

x = Conv2DTranspose(filters=1,
                    kernel_size=kernel_size,
                    padding='same')(x)

outputs = Activation('sigmoid', name='decoder_output')(x)

# Instantiate Decoder Model
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()

Please note the decoder uses latent_inputs as its input, but latent_inputs comes from Input, not from the output of the encoder which is latent.

Could anyone tell me why it is like that? Or is that a mistake in the doc? Thanks a lot.

1

1 Answers

0
votes

You are confused between naming convention that are used Input of Model(..)and input of decoder.

In this code, two separate Model(...) is created for encoder and decoder. When you will create your final autoencoder model, for example in this figure you need to feed output of the encoder to the input of decoder. enter image description here

As you described, "decoder uses latent_inputs as its input, but latent_inputs comes from Input (this input is the input of the Decoder Model only not the Autoencoder model)".

encoder = Model(inputs, latent, name='encoder') creates the encoder model and decoder = Model(latent_inputs, outputs, name='decoder') creates the decoder model which use latent_inputs as input which is output of encoder model.

Final autoencoder model will be generated by,

autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder')

Here, your input to the encoder model is from inputs and your output from the decoder model is your final output of autoencoder. And to create output of the encoder, first it feeds the inputs to encoder(...) and output of the encoder feeds to the decoder as decoder(encoder(...))

For your simplicity, you can also create model like this as well,

# Build the Autoencoder Model
# Encoder
inputs = Input(shape=input_shape, name='encoder_input')
x = inputs
for filters in layer_filters:
    x = Conv2D(filters=filters,
               kernel_size=kernel_size,
               strides=2,
               activation='relu',
               padding='same')(x)
shape = K.int_shape(x)
x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)

# Decoder

x = Dense(shape[1] * shape[2] * shape[3])(latent)
x = Reshape((shape[1], shape[2], shape[3]))(x)

for filters in layer_filters[::-1]:
    x = Conv2DTranspose(filters=filters,
                        kernel_size=kernel_size,
                        strides=2,
                        activation='relu',
                        padding='same')(x)

x = Conv2DTranspose(filters=1,
                    kernel_size=kernel_size,
                    padding='same')(x)

outputs = Activation('sigmoid', name='decoder_output')(x)


autoencoder = Model(inputs, outputs, name='autoencoder')
autoencoder.summary()