1
votes

I'm having trouble with this autoencoder I'm building using Keras. The input's shape is dependent on the screen size, and the output is going to be a prediction of the next screen size... However there seems to be an error that I cannot figure out... Please excuse my awful formatting on this website...

Code:

def model_build():
input_img = InputLayer(shape=(1, env_size()[1], env_size()[0]))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = Model(input_img, decoded)
return model
if __name__ == '__main__':
    model = model_build()
    model.compile('adam', 'mean_squared_error')
    y = np.array([env()])
    print(y.shape)
    print(y.ndim)
    debug = model.fit(np.array([[env()]]), np.array([[env()]]))

Error:

Traceback (most recent call last): File "/home/ai/Desktop/algernon-test/rewarders.py", line 46, in debug = model.fit(np.array([[env()]]), np.array([[env()]])) File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 952, in fit batch_size=batch_size) File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training.py", line 789, in _standardize_user_data exception_prefix='target') File "/home/ai/.local/lib/python3.6/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data str(data_shape)) ValueError: Error when checking target: expected conv2d_7 to have shape (4, 268, 1) but got array with shape (1, 270, 480)

EDIT:

Code for get_screen imported as env():

def get_screen():
    img = screen.grab()
    img = img.resize(screen_size())
    img = img.convert('L')
    img = np.array(img)
    return img
2
What is the original shape of your data? Add code for env() Error occurs in line decoded = ?Sharky
@Sharky. There is no supposed "original shape of the data'... Basically I'm just wanting the loss from the autoencoder (to serve as a reward for my rl agent...) The current screen and it's resolution divided by 4 is the shape of the data I guess, as well as being turned to greyscale...ZeroMaxinumXZ

2 Answers

0
votes

Looks like env_size() and env() mess image dimensions somehow. Consider this example:

image1 = np.random.rand(1, 1, 270, 480) #First dimension is batch size for test purpose
image2 = np.random.rand(1, 4, 268, 1) #Or any other arbitrary dimensions

input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()

This line will work:

model.fit(image1, nb_epoch=1, batch_size=1)

But this doesn't

model.fit(image2, nb_epoch=1, batch_size=1)

Edit: In order to get output of the same size as input you need to calculate convolution kernel size carefully. image1 = np.random.rand(1, 1920, 1080, 1)

input_img = layers.Input(shape=image1[0].shape)
x = layers.Conv2D(32, 3, activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, 3, activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, 3, activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, 1, activation='relu')(x) # set kernel size to 1 for example
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, 3, activation='sigmoid', padding='same')(x)
model = tf.keras.Model(input_img, decoded)
model.compile('adam', 'mean_squared_error')
model.summary()

This will output same dimensions.

As per this guide http://cs231n.github.io/convolutional-networks/

We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output.

1
votes

You have three 2x downsampling steps, and three x2 upsampling steps. These steps have no knowledge of the original image size, so they will round out the size to the nearest multiple of 8 = 2^3.

cropX = 7 - ((size[0]+7) % 8)
cropY = 7 - ((size[1]+7) % 8)

cropX = 7 - ((npix+7) % 8)
cropY = 7 - ((nlin+7) % 8)

It ought to work if you add a new final layer...

decoded = layers.Cropping2D(((0,cropY),(0,cropX)))(x)