I have a csv file, whose head is: DATE Alert and OriginatingAddress.
How should I build a term-document matrix based on two columns: DATE and Alert.
The row will have the alert and the column will have the day. The entry indicates the number of occurrences of the alert in a day.
I've tried:
library(tm)
myCorpus <- read.csv("alert-sample-data-4-mining.csv")
corpus <- Corpus(VectorSource(myCorpus$DATE, myCorpus$Alert))
TermDocumentMatrix(corpus)
But the result is not what I want.
The current result I got is:
++++++++++++++++++++++++++++
A term-document matrix (31 terms, 69124 documents)
Non-/sparse entries: 69124/2073720
Sparsity : 97%
Maximal term length: 9
Weighting : term frequency (tf)
++++++++++++++++++++++++++++++++++++++++++++
str(myCorpus)
'data.frame': 69124 obs. of 3 variables:
$ DATEFORMAT : Factor w/ 31 levels "3/01/2013","3/02/2013",..: 21 21 21 21 21 21 21 21 21 21 ...
$ Alert : Factor w/ 88 levels "%BGP-5-ADJCHANGE",..: 49 49 49 49 49 49 49 49 49 49 ...
$ OriginatingAddress: Factor w/ 98 levels "10.112.36.12",..: 67 67 67 67 67 67 67 67 67 67 ...