I want to label some documents, I tried the LDA algorithm but the results were too messy. I decided to use a supervised approach, so I created my own topic-word matrix but I don't know how to generate a document-topic matrix. Do you know some good topic modeling algorithm that can be trained using topic-word matrix ?
1 Answers
1
votes
If you do have a correct topic-word matrix created. You only need to compute the weights of topic for each documents. For example you could use the occurence of each word in each documents and then summing the topic weight of those words. You might need to add some coefficients like number of occurence but it is pretty straightforward.
You can also use LDA algorithm but ignoring the training step which is made to process the topic-word matrix. I do not know which implementation you use but following the one of Sklearn you can directly pass the matrix as components_
attributes and then use the transform
function.