I'm working on a sentiment analysis project in python with keras using CNN and word2vec as an embedding method I want to detect positive, negative and neutral tweets(in my corpus I considered every negative tweets with the 0 label, positive = 1 and neutral = 2). Since I'm new in this field I have some questions, here is a part of my code: ***Assuming that X-train and X-test contain tweets and Y-train and Y-test contain tweet's labels.
if i < train_size:
if labels[index] == 0 :
Y_train[i, :] = [1.0, 0.0]
elif labels[index] == 1 :
Y_train[i, :] = [0.0, 1.0]
else:
Y_train[i, :] = [1.0, 1.0]
else:
if labels[index] == 0 :
Y_test[i - train_size, :] = [1.0, 0.0]
elif labels[index] == 1 :
Y_test[i - train_size, :] = [0.0, 1.0]
else:
Y_test[i - train_size, :] = [1.0, 1.0]
in the code above you see that I considered if a related label was 0(if labels[index] == 0 :) as negative I put [1.0, 0.0] in some specific list and if the label was 1(if labels[index] == 1 :) I put [0.0, 1.0] as positive tweets and else (if labels[index] == 2 :) as neutral i put [1.0, 1.0] so just consider that the logical part af my code that i mentioned is ok.
here is my keras model:
model = Sequential()
model.add(Conv1D(32, kernel_size=3, activation='elu',
padding='same', input_shape=
(max_tweet_length,vector_size)))
model.add(Conv1D(32, kernel_size=3, activation='elu',
padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu',
padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu',
padding='same'))
model.add(Dropout(0.25))
model.add(Conv1D(32, kernel_size=2, activation='elu',
padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu',
padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu',
padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu',
padding='same'))
model.add(Dropout(0.25))
model.add(Dense(256, activation='tanh'))
model.add(Dense(256, activation='tanh'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2, activation='sigmoid'))
So in order to predict a new input, I have this code:
sentiment = model.predict(np.array(a),batch_size=1,verbose = 2)[0]
if(np.argmax(sentiment) == 0):
print("negative")
print('the label is')
print(np.argmax(sentiment))
elif (np.argmax(sentiment) == 1):
print("positive")
print('the label is')
print(np.argmax(sentiment))
elif (np.argmax(sentiment) ==2):
print("neutral")
print('the label is')
print(np.argmax(sentiment))
My question contains 2 parts: I wanna know is it true to predict in such way? AS far as I told I considered label 2 for neutral tweets and for this reason I considered if (np.argmax(sentiment) ==2) then print neutral - Is this logical or acceptable for prediction??
I mean I considered to assign [0.1, 1.0] for neutral tweets in train and test set so If I consider 2 as neutral in prediction part, does it make any sense??
thanks a lot
****for regression is it true to change my train and test code in such way? considering 0,1,2 as polarities in my corpus
if i < train_size:
if labels[index] == 0 :
Y_train[i, :] = [1.0, 0.0]
elif labels[index] == 1 :
Y_train[i, :] = [0.0, 1.0]
elif labels[index]==2
Y_train[i, :] = [0.5, 0.5]
else:
if labels[index] == 0 :
Y_test[i - train_size, :] = [1.0, 0.0]
elif labels[index] == 1 :
Y_test[i - train_size, :] = [0.0, 1.0]
else:
Y_test[i - train_size, :] = [0.5, 0.5]
then setting'sigmoid' for activation:
model.add(Dense(256, activation='tanh'))
model.add(Dense(256, activation='tanh'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2, activation='sigmoid'))
and Can I predict my input tweet in the way i mentioned above??
if (np.argmax(sentiment) ==2):
print("neutral")
print('the label is')
print(np.argmax(sentiment))
*****If I used word2vec for embedding and considering 0,1,2 as polarities in my corpus Can I set labels in such way?
if i < train_size:
if labels[index] == 0 :
Y_train[i, :] = [1.0, 0.0,0.0]
elif labels[index] == 1 :
Y_train[i, :] = [0.0, 1.0,0.0]
else:
Y_train[i, :] = [0.0, 0.0,1.0]
else:
if labels[index] == 0 :
Y_test[i - train_size, :] = [1.0, 0.0,0.0]
elif labels[index] == 1 :
Y_test[i - train_size, :] = [0.0, 1.0,0.0]
else:
Y_test[i - train_size, :] = [0.0, 0.0,1.0]
and then for compiling:
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001, decay=1e-6),
metrics=['accuracy'])
thank you for your patience