0
votes

Could someone indicate difference between feature selection and clustering and dimensionality reduction algorithms?

feature selection algorithms: allows to find the predominant variables either which best represent the data or best parameters to indicate the class for eg : gbm / lasso

Clustering helps us to indicate which clusters of variables clearly define the output

Isnt this same as dimensionality reduction algorithm? Doesn't feature selection + clustering do the same as dimensionality reduction algorithms?

1

1 Answers

4
votes

Feature Selection:

In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Clustering:

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

Dimensionality Reduction:

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.

When you have many features and want to use some of them then you can apply feature selection (i.e. mRMR). So, it means that you have applied a dimensionality reduction.

However, clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields (check Clustering in Machine Learning). When you want to group (cluster) different data points according to their features you can apply clustering (i.e. k-means) with/without using dimensionality reduction.