I have a dataset that includes both images and text features. The labels for the training data is a 2 dimensional array, the same shape as the input images, of 1s/0s.
So basically, the training inputs are:
- Input image with shape of
(X,Y)
, - Additional feature set (i.e. text features) with shape
(Z,)
.
And training labels have the shape of (X,Y)
.
I am trying to train a model using Tensorflow/Keras on this data. I know I can train a model where the input size is (X* Y) + Z
, but I read that isn't the best way to handle mixing image/additional-data features.
So my questions are:
1) How would I set up my model to handle the mixed input types?
2) Since my output is the same size as my image, would I need to define a (X * Y)
sized output layer? How would I specify the output layer so that it can take multiple values, that is, any/multiple location in the output can be 1 or 0?