0
votes

All the hierarchical clustering methods that I have seen implemented in Python (scipy, scikit-learn, etc.,) split or combine two clusters at a time. This forces the branching factor to be 2 at each node. For my purpose, I want the model to allow branching factor to be greater than 2. That's helpful in situations where there are ties between clusters.

I'm not familiar with any hierarchical clustering techniques that have a branching factor greater than 2; do they exist?

1
Welcome to Stack Overflow! I edited the title of your question to include more detail about your question—it's about hierarchical clustering in general, rather than document clustering. I'd like to point you toward UPGMA and WPGMA. These are implemented in scipy (and wrappers exist in scikit-learn), and they do allow ties.Arya McCarthy

1 Answers

0
votes

Cluster this data set with single link:

0 0
0 1
1 0
1 1

And you will see a 4-way merge.

But for other linkages, always finding the best 3-way split would likely increase the runtime cost to O(n^4). You really don't want that.