I'm Trying to classify features using Naive Bayes classifier, I used TF_IDF for feature extraction.
The finaltfidfVector
is a list of vectors, each vector represents list of numbers, 0
if the word not found, else the weight of word if it found.
And classlabels
contains all class label for each vector. I'm trying to classify it with this code but it doesn't work.
26652 lines for Dataset
from nltk.classify import apply_features
def naivebyse(finaltfidfVector,classlabels,reviews):
train_set = []
j = 0
for vector in finaltfidfVector:
arr={}
if j<18697:
arr[tuple(vector)] = classlabels[j]
train_set.append((arr, reviews[j]))
j += 1
test_set = []
j = 18697
for vector in finaltfidfVector:
arr = {}
if j < 26652 and j>=18697:
arr[tuple(vector)] = classlabels[j]
test_set.append((arr, reviews[j]))
j += 1
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))
The output :
0.0
The used reference for TF_IDF and applied on finaltfidfVector
https://triton.ml/blog/tf-idf-from-scratch?fbclid=IwAR3UlCToGYFEQSmugXo3M5Q9fcld79JfXSfBaDG7wKv5a49O0ZDEft9DFNg.
data set
this is sample about the used data set before preprocessing and TF_IDF
This is sample for the first vector for index of zero in finaltfidfVector
list
[0.0,0.0, 0.0, 0.6214608098422192, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5115995809754083,0.0,0.0, 0.0, 0.0, 0.5521460917862246, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6214608098422192,0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6214608098422192, 0.0, 0.0, 0.0, 0.6214608098422192]
classlabels
contains class label for each vector
, 1 for sarcasm 0 for not sarcasm. The class label of index 0 is 1, this 1 for the first vector in finaltfidfVector
.
The first item for train_set is
({(0.0, 0.0, 1.3803652294655615,.....ect): '0'}, "former versace store clerk sues over secret 'black code' for minority shoppers")
finaltfidfVector
andclasslabels
? Preferably show some of the data. – knh190naivebayes
function has syntax error. Please fix that and post related code again. – knh190