1
votes

I would like to replace every word in my corpus that contains 'kind' with 'Kindertoekomst'. I can do it normally:

Woorden<-c("kinderen", "kleinkind")
Woorden[grepl("kind", Woorden)]<-"Kindertoekomst"

But I would like to do it within my Corpus.

I managed to do this with

Kind<-grepl("kind", Woorden)
docs <- tm_map(docs, function(x) stri_replace_all_fixed(x, Woorden[as.logical(Kind)], "kindertoekomst", vectorize_all = FALSE))

But then I can't use other functions anymore:

dtm <- DocumentTermMatrix(docs)

Error: inherits(doc, "TextDocument") is not TRUE

And corpus_clean <- tm_map(docs, content_transformer(tolower)) Error in UseMethod("content", x) : no applicable method for 'content' applied to an object of class "character"

Please help me :)

2

2 Answers

1
votes

This should work:

docs <- tm_map(docs, function(x) stri_replace_all_fixed(x, Woorden[as.logical(Kind)], "kindertoekomst", vectorize_all = FALSE))
docs <- tm_map(docs, PlainTextDocument) 
dtm <- DocumentTermMatrix(docs)
0
votes

Alternate approach that uses the content_transformer() function wrapper in the tm package

library(tm)

Woorden<-c("kinderen", "kleinkind")

rep_kind <- function(x){ 
  gsub("\b.*kind.*\b","Kindertoekomst",x)
}

docs <- Corpus(VectorSource(as.list(Woorden)))
docs <- tm_map(docs, content_transformer(rep_kind))
dtm <- DocumentTermMatrix(docs)
inspect(dtm)