0
votes

I'm creating a correlated topic model from public review data and getting a rather odd error.

When I call terms(ctm1, 5) on my CTM, I get back the names of the documents rather than the top 5 terms for each topic.

In more detail I ran,

library(topicmodels)
library(data.table)
library(tm)

a  <-Corpus(DirSource("~/text", encoding="UTF-8"), readerControl = 
list(language="lat")) 
a <- tm_map(a, removeNumbers)
a <- tm_map(a, removePunctuation)
a <- tm_map(a , stripWhitespace)
a <- tm_map(a, tolower)
a <- tm_map(a, removeWords, stopwords("english")) 
a <- tm_map(a, stemDocument, language = "english")
adtm <-TermDocumentMatrix(a) 
adtm <- removeSparseTerms(adtm, 0.75)

ctm1 <- CTM(adtm, 30, method = "VEM", control = NULL, model = NULL)
terms(ctm1, 5)

which returned

terms(ctm1)
          Topic 1  "cmnt656661.txt" 

(etc.)

1

1 Answers

1
votes

We cannot know for sure because you did not provide data; but it is likely that you did not import the files correctly. See ?DirSource (my emphasis):

directory : A character vector of full path names; the default corresponds to the working directory getwd().

In your case, it seems like you should do something like this:

a <- Corpus(DirSource(list.files("~/text", full.names = TRUE)))