Using the text mining package tm for R, the following works in version 0.6.2, R version 3.4.3:
library(tm)
a = "This is the first document."
b = "This is the second document."
c = "This is the third document."
d = "This is the fourth document."
docs1 = VectorSource(c(a,b))
docs2 = VectorSource(c(c,d))
corpus1 = Corpus(docs1)
corpus2 = Corpus(docs2)
corpus3 = c(corpus1,corpus2)
inspect(corpus3)
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 4
However, the same code in tm version 0.7.3 (R version 3.4.2) gives an error:
Error in UseMethod("inspect", x) :
no applicable method for 'inspect' applied to an object of class "list"
According to vignette("tm",package="tm"), the c() function is overloaded:
Many standard operators and functions (
[, [<-, [[, [[<-, c(), lapply()) are available for corpora with semantics similar to standard R routines. E.g.,c()concatenates two (or more) corpora. Applied to several text documents it returns a corpus. The metadata is automatically updated, if corpora are concatenated (i.e., merged).
However, for the new version this is apparently no longer the case. How can two corpora be combined in tm 0.7.3? An obvious solution is to combine the documents first and create the corpus afterwards, but I'm looking for a solution to combine two already existing corpora.