0
votes

I have a dataframe that contains paragraphs that I would like to perform Latent Dirichlet allocation. To do this I need to create a term document matrix. This example code shows the error:

library(qdap)
library(topicmodels)

remove(list=ls())
doc <- c(1,2,3,4)
text <- c("The Quick Brown Fox Jumped Over The Lazy Dog",
        "The Cow Jumped Over The Moon",
        "Moo, Moo, Brown Cow Have You Any Milk",
        "The Fox Went Out One Moonshiny Night")
works.df <- data.frame(doc,text)

works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc)
works.lda <- LDA(works.tdm, k = 2, control = list(seed = 1234))

where

works.tdm <- as.tdm(text.var=works.df$text, grouping.var=works.df$doc) Error in .TermDocumentMatrix(x, weighting) : argument "weighting" is missing, with no default

What I thought was that I would get a sparse matrix where, for example: term "the" appears in documents 1 (with a frequency of 2), 2 (with a frequency of 2) and 4 (with a frequency of 1); term "cow" appears in documents 2 and 3 (both frequency of 1); ...

Chan anyone advise as to what is missing or if there is a better way to achieve my task? Thanks.

2

2 Answers

0
votes

You need to supply a weighting, as R requests:

library(tm)
works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc, weighting = weightTf)
0
votes

Looks like I needed to turn into a corpus first and use the more common, DocumentTermMatrix()

> remove(list=ls())
> doc<-c(1,2,3,4)
> text<-c("The Quick Brown Fox Jumped Over The Lazy Dog",
+         "The Cow Jumped Over The Moon",
+         "Moo, Moo, Brown Cow Have You Any Milk",
+         "The Fox Went Out One Moonshiny Night")
> works.df<-data.frame(doc,text)
> corp <- VCorpus(VectorSource(works.df$text))
> works.tdm <- DocumentTermMatrix(corp, control=list(weighting=weightTf))
> works.tdm
<<DocumentTermMatrix (documents: 4, terms: 20)>>
Non-/sparse entries: 27/53
Sparsity           : 66%
Maximal term length: 9
Weighting          : term frequency (tf)
> as.matrix(works.tdm)
    Terms
Docs any brown cow dog fox have jumped lazy milk moo, moon moonshiny night one out over quick the went
   1   0     1   0   1   1    0      1    1    0    0    0         0     0   0   0    1     1   2    0
   2   0     0   1   0   0    0      1    0    0    0    1         0     0   0   0    1     0   2    0
   3   1     1   1   0   0    1      0    0    1    2    0         0     0   0   0    0     0   0    0
   4   0     0   0   0   1    0      0    0    0    0    0         1     1   1   1    0     0   1    1
    Terms
Docs you
   1   0
   2   0
   3   1
   4   0