0
votes

Can human cluster data sets manually? For example, consider the Iris data set, depicted below:

http://i.stack.imgur.com/Ae6qa.png

Instead of using clustering algorithms like connectivity-based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering, density-based clustering. etc.

Can a human manually cluster the Iris dataset? For our convenience, let us consider it as a two dimensional dataset. By which means and how a human would cluster the dataset?


I am concerned that "human clustering" might not be well-defined and could vary according to different people's intuitions and opinions.I would like to know what are the clustering algorithms that are closest to the human clustering or how the data-set clustering is performed by humans? Is there a clustering algorithm that would perform just like the humans do the clustering?

1
Humans can actually cluster of data extremely well, and most people would need little to no training to spot clusters. The algorithms we devised for computers are mostly an attempt to replicate what most human can do naturally very easily. The problem with human's way of clustering though, is that it's non standardized, so yes, every person would come up with slightly different clusters, even the same person might come up with slightly different clusters for the same set of data if you ask them multiple times. - Lie Ryan
@Lie Ryan: I disagree. If you have a 2-d data set with a few well-separated clusters, then yes, humans will do this very well. However, it is often unclear how many clusters there are. Imagine one large cluster that might or might not be split into 5 smaller clusters. The dimension is often not 2. People need a lot of help to decide what the clusters are in 20-dimensional or 2000-dimensional data. What is often needed is an objective measure of how well 10 versus 100 clusters describes high dimensional data. Then clustering algorithms go way beyond trying to replicate a human ability. - Douglas Zare
I don't think an alien would say we have 7 continents. - Douglas Zare

1 Answers

0
votes

Humans can and do cluster data manually, but as you say there will be a lot of variation and subjective decisions. Assuming that you could get an algorithm that will use the same features as a human, it's in principle possible to have a computer cluster like a human.

At a first approximation, nearest neighbor algorithms are probably close to how humans cluster in they group things look similar under some measure. Keep in mind that without training and significant ongoing effort, humans really don't do well on consistency. We seem to be biased toward looking for novelty, so we tend to break things into two big clusters, the stuff we encounter all of the time, and everything else.