I am trying to use the fantastic Quanteda to look at co-occurance of terms in news articles.
I can find the features which co-occur with "美国" (the United States) as follows:
ch14_corp <- corpus(data_14)
ch14_toks <- tokens(ch14_corp, remove_punct = TRUE) %>%
+ tokens_remove(ch_stop)
ch14_fcm <- fcm(ch14_toks, context = "window")
and then get the features that co-occur most frequently
topfeatures(ch14_fcm["美国", ], n=50)
朝鲜 美国 日本 中国 韩国 问题 马 政府 国家 报道
881 804 555 552 297 288 270 254 253 243
奥 总统 称 战略 表示 韩 关系 政策 认为 进行
238 238 234 227 214 174 173 169 162 160
中 核 亚太 国家安全 经济 安全 局 世界 发言 国务院
157 153 148 137 136 136 136 135 132 129
美 国 访问 俄罗斯 军事 国际 官员 媒体 公民 人权
126 122 121 120 120 118 118 114 114 114
联合 一个 名 地区 安倍 平衡 导弹 国防 斯 克里
112 112 112 111 110 110 107 105 104 102
Could anybody tell me how convert this to a 'data.frame'? Or a table with the 'feature' in column A and then the number of times it co-occurs with '美国' in column B?
I guess the other way might be to not use 'topfeatures' but to get just the row (or column?) of the matrix which has all the terms that co-occur with '美国', then to sort these based on the number of times they co-occur?