Text classification using e1071 (SVM)

Question

I have a dataframe having two columns. One Column contains text. Each row of that column one contains some type of data of three different classes(skill,qualification,experience) and other column is their respective class labels.

Snapshot of the dataframe:

snapshot of the dataframe

How to apply svm from package e1071. How to Convert text data Column into some score. I thought of converting the textual column into document-term matrix. Is their any other way? How to make a d-t-matrix ?

Nishu Tayal Nishu Tayal · Accepted Answer · 2016-10-14T21:02:13

You can use RTextTools packages to create a document term matrix. Use create_matrix function :

# Create the document term matrix. If column name is v1
dtMatrix <- create_matrix(data["v1"])

Then you can train your SVM model using this:

# Configure the training data
container <- create_container(dtMatrix, data$label, trainSize=1:102, virgin=FALSE)
 
# train a SVM Model
model <- train_model(container, "SVM", kernel="linear", cost=1)

For information, RTextTools user e1071 package internally to train the models.

For more details, please refer the RTextTools and e1071 documentation.

Text classification using e1071 (SVM)

3 Answers