I want to build a word cloud containing multiple word structures (not just one word). In any given text we will have bigger frequencies for unigrams than bigrams. Actually, the n-gram frequency decreases when n increases for the same text.
I want to find a magic number or a method to obtain comparative results between unigrams and bigrams, trigrams, n-grams.
There is any magic number as a multiplier for n-gram frequency in order to be comparable with a unigram?
A solution that I have now in mind is to make a top for any n-gram (1, 2, 3, ...) and use the first z positions for any category of n-grams.