1
votes

I am working with a data-set that contains reviews of an item. The code runs perfectly for the most of the reviews which normally has around 20-30 words, but the code throws an error whenever reviews with only a single word occurs.

library(NLP)
library(openNLP)
library(stringr)

x <- NLP::as.String("pathetic")
wordAnnotation <- NLP::annotate(x, list(Maxent_Sent_Token_Annotator(), 
  Maxent_Word_Token_Annotator()))
POSAnnotation <- NLP::annotate(x, Maxent_POS_Tag_Annotator(), 
  wordAnnotation)
POSwords <- subset(POSAnnotation, type == "word")
tags <- sapply(POSwords$features, '[[', "POS")
tokenizedAndTagged <- data.frame(Tokens = x[POSwords], Tags = tags, 
  stringsAsFactors = FALSE)
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = 
stringsAsFactors) : cannot coerce class ""String"" to a data.frame

I have seen other similar questions, tried the solutions like resolving function overriding issue by using NLP::annotate, restarting R session but didn't work. Please point out how to resolve the issue. Thanks in advance.

1
Sounds like you are putting output to a df. - Rana Usman

1 Answers

2
votes

You need to wrap Tokens value with as.character -

tokenizedAndTagged <- data.frame(Tokens = as.character(x[POSwords]), 
                                 Tags = tags, 
                                 stringsAsFactors = FALSE)