0
votes

I'm a newbie for machine learning, and I have following question. Suppose that I have implemented a classification algorithm on some data, and recognized the best combination of features for the classification algorithm. If someday I get data from same resource, which lack the target feature in previous classification task, Can I use the best combination of features for classification directly to clustering task? (I know I can use the model I trained to predict the target of data, but I just want to know whether the best combination of features is same between classification and clustering algorithms)

I have searched websites and any resource I know, but I can't find the answer for my question, Could somebody tell me or just give me a link? Thanks!

2

2 Answers

0
votes

I would say yes, provided the nature of the target is the same in both cases. What we want ideally is a tractable number of features which are orthogonal (perpendicular) to each other in N space, so that each can contribute maximally to the prediction.

Take a concrete example, that of T shirts and whether they are Large size or Small size. You are given data which shows that in the manufacturing process there is a bit of material shrinkage which means the T shirts come out a bit irregular, and the shrinkage varies between the height and width, but not much. The data shows height, width and colour and you want to decide if they are in the large group or the small. You find that the height and width are important but the colour is not, so you decide to go with the height and width as your classification features.

The important point is that these two features have been identified as the most orthogonal to each other, which should apply in a classification or clustering context. The number of clusters remains a factor to be examined.

0
votes

It may not be good enough.

For example a decision tree or random forest can be analyzed to get the importance of features. But this will not tell you what kind of preprocessing (in particular scaling and weighting) is necessary to be able to cluster them (in particular, categorical features are difficult to use, anything that is not continuous or that is skewed is hard).

Furthermore, data tends to change over time. Features that were important once (e.g. Facebook likes) are useless now.