1
votes

I would like to apply a 1D convolution on fixed size DNA sequence using keras.
Dna sequence is 45 bases long. Each sequence has been one-hot encoded. There is one filter with kernel size = 3. See picture below :

1D convolution

I have 1000 sequences for my training. The shape of x_train is then : (1000, 45, 4).
The target is True/False with shape : (1000,)

I tried to use keras like this :

K.clear_session()

model = Sequential()

#add model layers
model.add(Conv1D(1, kernel_size=1, activation="relu", input_shape =(1000,45)))


#model.add(Flatten())
#model.add(Dense(2, activation="softmax"))


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10)

But I get the following error :

ValueError: Error when checking input: expected conv1d_1_input to have shape (1000, 45) but got array with shape (45, 4)
1

1 Answers

0
votes

In the input shape for the Conv1D layer you should not write the number of sequences in your dataset, and you definitely should indicate 2 dimensions of your one-hot encoded DNAs. So the answer is input_shape =(45,4) should make it. As for the number of sequences in the dataset, you can either totally omit this number or you can divide it in batch sizes (32, 64, 128...), which is better for GPU performance: model.fit(x_train, y_train, batch_size=64, epochs=10)