I have a series of phrases that occur in a larger text. I would like to emphasize the phrases, but I want to first compact the phrases. I am using Python 3.5 and NLTK for most of the processing.
For instance, if I have the sentence:
The quick brown fox jumped over the lazy dog
and the phrases
brown fox
quick brown fox
I want the resulting HTML to look like
The <b>quick brown fox</b> jumped over the lazy dog
not
The <b>quick <b>brown fox</b></b> jumped over the lazy dog
It seems like I should be able to craft some sort of list comprehension that removes items that are a subset of of other items in the list, but I can't quite seem to wrap my head around it. Any ideas about how I collapse my phrases to remove subsets of other entries?