0
votes

I'm working for a text classificator in Python using Keras. For now I tried so make model only with the words of my dataset, using bag of words. Now I would use in my classifier other custom features (like polarity) but I don't know how to add there in my code. My dataset is like:

 Text                    | Polarity | Number of words | Classification 

 Hello my name is John   |    0,05  |        5        |        0
 How old are you?        |    0,00  |        4        |        1
 I'm very hungry         |   -0,05  |        4        |        0

The middle two colums are my custom features that i want add to my classifier but I don't know how.

train_x = tokenizer.sequences_to_matrix(allWordIndices, mode='binary')
train_x2 = train_x

train_x = train_x[1000:]
test_x = train_x2[:1000]
train_y = keras.utils.to_categorical(train_y, 2)
train_y2 = train_y
train_y = train_y[1000:]
test_y = train_y2[:1000]


from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(Dense(30, input_shape=(max_words,), activation='relu'))
model.add(Dropout(0.45)) 
model.add(Dense(100, activation='softplus'))
model.add(Dropout(0.45))
model.add(Dense(2, activation='softmax'))

model.compile(loss='categorical_crossentropy',optimizer='RMSProp',metrics=['accuracy'])

history = model.fit(train_x,train_y,batch_size=32,epochs=10,verbose=1,validation_split=0.1,shuffle=True)

score = model.evaluate(test_x,test_y, batch_size=128)

In this example i use only bag of words feature of the content f first column and i want add other 2 column like features (polarity, number of words). Someone has an idea how add these? Thanks in advance.

1

1 Answers

0
votes

For Bag of words you can just concatenate your numerical features on top of your BoW vector. Therefore you can just use numpy, or even easier pandas. Then you have a vector with the dimension max_words + custom_numerical_features.

Anyway I did somthing similar and worked a lot with several approaches like BoW and embeddings.

It is a good idea to sperate text features and numerical features in your network. To do so you can use multiple input models. I just wrote a blog about it you can take a look here. There are embeddings used, but in general it works also for BoW.