I'm working with the Python NLTK Wordnet API. I'm trying to find the best synset that represents a group of words.
If I need to find the best synset for something like "school & office supplies", I'm not sure how to go about this. So far I've tried finding the synsets for the individual words and then computing the best lowest common hypernym like this:
def find_best_synset(category_name):
text = word_tokenize(category_name)
tags = pos_tag(text)
node_synsets = []
for word, tag in tags:
pos = get_wordnet_pos(tag)
if not pos:
continue
node_synsets.append(wordnet.synsets(word, pos=pos))
max_score = 0
max_synset = None
max_combination = None
for combination in itertools.product(*node_synsets):
for test in itertools.combinations(combination, 2):
score = wordnet.path_similarity(test[0], test[1])
if score > max_score:
max_score = score
max_combination = test
max_synset = test[0].lowest_common_hypernyms(test[1])
return max_synset
However this doesn't work very well plus it is very costly. Are there any ways to figure out which synset best represents multiple words together?
Thanks for your help!