I'm using pySpark ML LDA library to fit a topic model on the 20 newsgroups dataset from sklearn. I'm doing the standard tokenization, stop-word removal and tf-idf transformations on the training corpus. In the end, I can get the topics and print out word indices and their weights:
topics = model.describeTopics()
topics.show()
+-----+--------------------+--------------------+
|topic| termIndices| termWeights|
+-----+--------------------+--------------------+
| 0|[5456, 6894, 7878...|[0.03716766297248...|
| 1|[5179, 3810, 1545...|[0.12236370744240...|
| 2|[5653, 4248, 3655...|[1.90742686393836...|
...
However, how do I map from term indices to actual words to visualize the topics? I'm using a HashingTF applied to a tokenized list of strings to derive the term indices. How do I generate a dictionary (map from indices to words) for visualizing topics?