0
votes

I am currently using the tm package to do some text mining. I want to be able to export my document term matrix as a data frame with my corpus meta data attached (id variable, etc.) Here is my current workflow:

  1. Import data set
  2. Convert to corpus
  3. Basic cleaning
  4. Create TF-IDF Document Term Matrix
  5. Transform the DTM into a dataframe
  6. Export the dataframe with corpus meta data

Number 5 is where I am getting stuck. I feel like this should definitely be possible with the package, but I can't find any documentation. Does the metadata get lost when creating a DTM using tm?

1

1 Answers

0
votes

Going to answer my own question here in case anyone else overlooks the same thing I did.

The DTM that tm makes stores the doc_id variable as a row name. So you can use your preferred row name to variable code to create a new variable, then use that as a key to to append any other meta-data.

Example of one way to do it:

dtm <- tibble::rownames_to_column(dtm, var = "doc_id")