How to test NLP model against many strings

Question

I have trained a classifier model using logistic regression on a set of strings that classifies strings into 0 or 1. I currently have it where I can only test one string at a time. How can I have my model run through more than one sentence at a time, maybe from a .csv file so I dont have to input each sentence individually?

def train_model(classifier, feature_vector_train, label, feature_vector_valid,valid_y, is_neural_net=False): classifier.fit(feature_vector_train, label)

# predict the labels on validation dataset
predictions = classifier.predict(feature_vector_valid)

if is_neural_net:
    predictions = predictions.argmax(axis=-1)

return classifier , metrics.accuracy_score(predictions, valid_y)

then

model, accuracy = train_model(linear_model.LogisticRegression(), xtrain_count, train_y, xtest_count,test_y)

Currently how I test my model

sent = ['here I copy a string']

converting text to count bag of words vectors

count_vect = CountVectorizer(analyzer='word', token_pattern=r'\w{1,}',ngram_range=(1, 2))
x_feature_vector =  count_vect.transform(sent)
pred = model.predict(x_feature_vector)

and I get the sentence and its prediction

I wanted the model to classify all my new sentences at once and give a classification to each sentence.

chefhose chefhose · Accepted Answer · 2019-09-30T17:04:46

model.predict(X) takes a list of samples, the same for count_vec.transform(X) so you can read sentences from file and predict them together like this:

with open('file.txt', 'r') as f:
    samples = f.readlines()
    vecs = count_vec.transform(samples)
    preds = model.predict(vecs)
    for s, p in zip(samples, preds):
        #printing each sentence with the predicted label
        print(s + "     Label: " + p)

How to test NLP model against many strings

2 Answers