Naive Bayes multinomial model

Question

For a movie reviews dataset, I'm creating a naive bayes multinomial model. Now in the training dataset, there are reviews per genre. So instead of creating a generic model for the movie reviews dataset-ignoring the genre feature, how do I train a model that also takes into consideration the genre feature in addition the tf-idf associated with words that occurred in the review. Do I need to create one model for each of the genre, or can I incorporate it into one model?

Training Dataset Sample:
genre, review, classification
Romantic, The movie was really emotional and touched my heart!, Positive
Action, It was a thrilling movie, Positive
....

Test Data Set:
Genre, review
Action, The movie sucked bigtime. The action sequences didnt fit into the plot very well

Daneel R. Daneel R. · Accepted Answer · 2018-09-17T19:59:06

From the documentation, The multinomial distribution normally requires integer feature counts. Categorical variables provided as inputs, especially if they are encoded as integers, may not have a positive impact on the predictive capacity of the models. As stated above, you may either consider using a neural network, or dropping the genre column entirely. If after fitting the model shows a sufficient predictive capability on the text features alone, it may not even be necessary to add as input a categorical variable.

The way I would try this task is by stacking the dummy categorical values with the text features, and feeding the stacked array to a SGD model, along with the target labels. You would then perform GridSearch for the optimal choice of hyperparameters.

Naive Bayes multinomial model

2 Answers