When m is the amount of features and n is the amount of samples, the python scikit-learn site (http://scikit-learn.org/stable/modules/tree.html) states that the runtime to construct a binary decision tree is mnlog(n).
I understand that the log(n) comes from the average height of the tree after splitting. I understand that at each split, you have to look at each feature (m) and choose the best one to split on. I understand that this is done by calculating a "best metric" (in my case, a gini impurity) for each sample at that node (n). However, to find the best split, doesn't this mean that you would have to look at each possible way to split the samples for each feature? And wouldn't that be something like 2^n-1 * m rather than just mn? Am I thinking about this wrong? Any advice would help. Thank you.