Convert a variable in a data frame into a term document matrix

Question

I have a dataframe that contains paragraphs that I would like to perform Latent Dirichlet allocation. To do this I need to create a term document matrix. This example code shows the error:

library(qdap)
library(topicmodels)

remove(list=ls())
doc <- c(1,2,3,4)
text <- c("The Quick Brown Fox Jumped Over The Lazy Dog",
        "The Cow Jumped Over The Moon",
        "Moo, Moo, Brown Cow Have You Any Milk",
        "The Fox Went Out One Moonshiny Night")
works.df <- data.frame(doc,text)

works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc)
works.lda <- LDA(works.tdm, k = 2, control = list(seed = 1234))

where

works.tdm <- as.tdm(text.var=works.df$text, grouping.var=works.df$doc) Error in .TermDocumentMatrix(x, weighting) : argument "weighting" is missing, with no default

What I thought was that I would get a sparse matrix where, for example: term "the" appears in documents 1 (with a frequency of 2), 2 (with a frequency of 2) and 4 (with a frequency of 1); term "cow" appears in documents 2 and 3 (both frequency of 1); ...

Chan anyone advise as to what is missing or if there is a better way to achieve my task? Thanks.

SAFEX SAFEX · Accepted Answer · 2019-02-26T14:36:44

You need to supply a weighting, as R requests:

library(tm)
works.tdm <- as.tdm(text.var = works.df$text,  grouping.var = works.df$doc, weighting = weightTf)

Convert a variable in a data frame into a term document matrix

2 Answers