# Loading required libraries
# Set up logistics such as reading in data and setting up corpus
```{r}
# Relative path points to the local folder
folder.path="../data/InauguralSpeeches/"
# get the list of file names
speeches=list.files(path = folder.path, pattern = "*.txt")
# Truncate file names so it is only showing "FirstLast-Term"
prez.out=substr(speeches, 6, nchar(speeches)-4)
# Create a vector NA's equal to the length of the number of speeches
length.speeches=rep(NA, length(speeches))
# Create a corpus
ff.all<-Corpus(DirSource(folder.path))
```
# Clean the data
```{r}
# Use tm_map to strip all white spaces to a single space, to lower case case, remove stop words, empty strings and punctuation.
ff.all<-tm_map(ff.all, stripWhitespace)
ff.all<-tm_map(ff.all, content_transformer(tolower))
ff.all<-tm_map(ff.all, removeWords, stopwords("english"))
ff.all<-tm_map(ff.all, removeWords, c("can", "may", "upon", "shall", "will", "must", ""))
The problem line
ff.all<-tm_map(ff.all, gsub, pattern = "free", replacement = "freedom")
ff.all<-tm_map(ff.all, removeWords, character(0))
ff.all<-tm_map(ff.all, removePunctuation)
# tdm.all = a Term Document Matrix
tdm.all<-TermDocumentMatrix(ff.all)
So I am trying to replace words that are similar by one root word. For example, replacing "free" by "freedom" in a text mining project.
Then I learned this line from a Youtube tutorial: ff.all<-tm_map(ff.all, gsub, pattern = "free", replacement = "freedom"). Without this line, the code runs.
With this line added, R Studio gives this error "Error: inherits(doc, "TextDocument") is not TRUE" on the execution of this line: "tdm.all<-TermDocumentMatrix(ff.all)"
I think this should be a relatively simple issue, however I could not find a solution on stackoverflow.